qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH 1/7] block/nbd: Fix hang in .bdrv_close()


From: Maxim Levitsky
Subject: Re: [Qemu-block] [PATCH 1/7] block/nbd: Fix hang in .bdrv_close()
Date: Tue, 16 Jul 2019 16:08:33 +0300

On Fri, 2019-07-12 at 19:35 +0200, Max Reitz wrote:
> When nbd_close() is called from a coroutine, the connection_co never
> gets to run, and thus nbd_teardown_connection() hangs.
> 
> This is because aio_co_enter() only puts the connection_co into the main
> coroutine's wake-up queue, so this main coroutine needs to yield and
> wait for connection_co to terminate.

After diving into NBD's co-routines (this is 2nd time I do this) and speaking 
about this
with Max Reitz on IRC, could I suggest to extend the explanation a bit, 
something like that:


When nbd_close() is called from a coroutine, the connection_co never
gets to run, and thus nbd_teardown_connection() hangs.

This happens because the connection_co is woken up by nbd_teardown_connection() 
by closing the socket, 
which wakes up the IO channel on which the connection_co is blocked.

However connection_co is waken up by aio_co_wake, which has an assumption that 
if the caller is already in
a coroutine, the caller doesn't switch immediately to the woken coroutine, but 
rather it adds the coroutine to list
of coroutines to wake immediately after the current co-routine 
yields/terminates (self->co_queue_wakeup)
Since we instead do aio_poll, that never happens.


The patch itself looks fine, so
Reviewed-by: Maxim Levitsky <address@hidden>

Best regards,
        Maxim Levitsky

> 
> Suggested-by: Kevin Wolf <address@hidden>
> Signed-off-by: Max Reitz <address@hidden>
> ---
>  block/nbd.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 81edabbf35..8f5ee86842 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -61,6 +61,7 @@ typedef struct BDRVNBDState {
>      CoMutex send_mutex;
>      CoQueue free_sema;
>      Coroutine *connection_co;
> +    Coroutine *teardown_co;
>      int in_flight;
>  
>      NBDClientRequest requests[MAX_NBD_REQUESTS];
> @@ -135,7 +136,15 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>      qio_channel_shutdown(s->ioc,
>                           QIO_CHANNEL_SHUTDOWN_BOTH,
>                           NULL);
> -    BDRV_POLL_WHILE(bs, s->connection_co);
> +    if (qemu_in_coroutine()) {
> +        s->teardown_co = qemu_coroutine_self();
> +        /* connection_co resumes us when it terminates */
> +        qemu_coroutine_yield();
> +        s->teardown_co = NULL;
> +    } else {
> +        BDRV_POLL_WHILE(bs, s->connection_co);
> +    }
> +    assert(!s->connection_co);
>  
>      nbd_client_detach_aio_context(bs);
>      object_unref(OBJECT(s->sioc));
> @@ -207,6 +216,9 @@ static coroutine_fn void nbd_connection_entry(void 
> *opaque)
>      bdrv_dec_in_flight(s->bs);
>  
>      s->connection_co = NULL;
> +    if (s->teardown_co) {
> +        aio_co_wake(s->teardown_co);
> +    }
>      aio_wait_kick();
>  }
>  





reply via email to

[Prev in Thread] Current Thread [Next in Thread]