qemu-stable
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1] coroutine: avoid inserting duplicate coroutine to co_queu


From: zhuyangyang
Subject: Re: [PATCH v1] coroutine: avoid inserting duplicate coroutine to co_queue_wakeup
Date: Tue, 26 Mar 2024 19:53:56 +0800

On Mon, 25 Mar 2024 11:50:41 -0400, Stefan Hajnoczi wrote:
> On Mon, Mar 25, 2024 at 05:18:50PM +0800, zhuyangyang wrote:
> > If g_main_loop_run()/aio_poll() is called in the coroutine context,
> > the pending coroutine may be woken up repeatedly, and the co_queue_wakeup
> > may be disordered.
> 
> aio_poll() must not be called from coroutine context:
> 
>   bool no_coroutine_fn aio_poll(AioContext *ctx, bool blocking);
>        ^^^^^^^^^^^^^^^
> 
> Coroutines are not supposed to block. Instead, they should yield.
> 
> > When the poll() syscall exited in g_main_loop_run()/aio_poll(), it means
> > some listened events is completed. Therefore, the completion callback
> > function is dispatched.
> > 
> > If this callback function needs to invoke aio_co_enter(), it will only
> > wake up the coroutine (because we are already in coroutine context),
> > which may cause that the data on this listening event_fd/socket_fd
> > is not read/cleared. When the next poll () exits, it will be woken up again
> > and inserted into the wakeup queue again.
> > 
> > For example, if TLS is enabled in NBD, the server will call 
> > g_main_loop_run()
> > in the coroutine, and repeatedly wake up the io_read event on a socket.
> > The call stack is as follows:
> > 
> > aio_co_enter()
> > aio_co_wake()
> > qio_channel_restart_read()
> > aio_dispatch_handler()
> > aio_dispatch_handlers()
> > aio_dispatch()
> > aio_ctx_dispatch()
> > g_main_context_dispatch()
> > g_main_loop_run()
> > nbd_negotiate_handle_starttls()
> 
> This code does not look like it was designed to run in coroutine
> context. Two options:
> 
> 1. Don't run it in coroutine context (e.g. use a BH instead). This
>    avoids blocking the coroutine but calling g_main_loop_run() is still
>    ugly, in my opinion.
> 
> 2. Get rid of data.loop and use coroutine APIs instead:
> 
>    while (!data.complete) {
>        qemu_coroutine_yield();
>    }
> 
>    and update nbd_tls_handshake() to call aio_co_wake(data->co) instead
>    of g_main_loop_quit(data->loop).
> 
>    This requires auditing the code to check whether the event loop might
>    invoke something that interferes with
>    nbd_negotiate_handle_starttls(). Typically this means monitor
>    commands or fd activity that could change the state of this
>    connection while it is yielded. This is where the real work is but
>    hopefully it will not be that hard to figure out.

Thank you for your help, I'll try to fix it using the coroutine api.

> 
> > nbd_negotiate_options()
> > nbd_negotiate()
> > nbd_co_client_start()
> > coroutine_trampoline()
> > 

-- 
Best Regards,
Zhu Yangyang



reply via email to

[Prev in Thread] Current Thread [Next in Thread]