qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 05/21] block: Introduce bdrv_schedule_unref()


From: Stefan Hajnoczi
Subject: Re: [PATCH 05/21] block: Introduce bdrv_schedule_unref()
Date: Tue, 22 Aug 2023 15:01:43 -0400

On Thu, Aug 17, 2023 at 02:50:04PM +0200, Kevin Wolf wrote:
> bdrv_unref() is called by a lot of places that need to hold the graph
> lock (it naturally happens in the context of operations that change the
> graph). However, bdrv_unref() takes the graph writer lock internally, so
> it can't actually be called while already holding a graph lock without
> causing a deadlock.
> 
> bdrv_unref() also can't just become GRAPH_WRLOCK because it drains the
> node before closing it, and draining requires that the graph is
> unlocked.
> 
> The solution is to defer deleting the node until we don't hold the lock
> any more and draining is possible again.
> 
> Note that keeping images open for longer than necessary can create
> problems, too: You can't open an image again before it is really closed
> (if image locking didn't prevent it, it would cause corruption).
> Reopening an image immediately happens at least during bdrv_open() and
> bdrv_co_create().
> 
> In order to solve this problem, make sure to run the deferred unref in
> bdrv_graph_wrunlock(), i.e. the first possible place where we can drain
> again. This is also why bdrv_schedule_unref() is marked GRAPH_WRLOCK.
> 
> The output of iotest 051 is updated because the additional polling
> changes the order of HMP output, resulting in a new "(qemu)" prompt in
> the test output that was previously on a separate line and filtered out.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/block/block-global-state.h |  1 +
>  block.c                            |  9 +++++++++
>  block/graph-lock.c                 | 23 ++++++++++++++++-------
>  tests/qemu-iotests/051.pc.out      |  6 +++---
>  4 files changed, 29 insertions(+), 10 deletions(-)
> 
> diff --git a/include/block/block-global-state.h 
> b/include/block/block-global-state.h
> index f347199bff..e570799f85 100644
> --- a/include/block/block-global-state.h
> +++ b/include/block/block-global-state.h
> @@ -224,6 +224,7 @@ void bdrv_img_create(const char *filename, const char 
> *fmt,
>  void bdrv_ref(BlockDriverState *bs);
>  void no_coroutine_fn bdrv_unref(BlockDriverState *bs);
>  void coroutine_fn no_co_wrapper bdrv_co_unref(BlockDriverState *bs);
> +void GRAPH_WRLOCK bdrv_schedule_unref(BlockDriverState *bs);
>  void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
>  BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
>                               BlockDriverState *child_bs,
> diff --git a/block.c b/block.c
> index 6376452768..9c4f24f4b9 100644
> --- a/block.c
> +++ b/block.c
> @@ -7033,6 +7033,15 @@ void bdrv_unref(BlockDriverState *bs)
>      }
>  }
>  
> +void bdrv_schedule_unref(BlockDriverState *bs)

Please add a doc comment explaining when and why this should be used.

> +{
> +    if (!bs) {
> +        return;
> +    }
> +    aio_bh_schedule_oneshot(qemu_get_aio_context(),
> +                            (QEMUBHFunc *) bdrv_unref, bs);
> +}
> +
>  struct BdrvOpBlocker {
>      Error *reason;
>      QLIST_ENTRY(BdrvOpBlocker) list;
> diff --git a/block/graph-lock.c b/block/graph-lock.c
> index 5e66f01ae8..395d387651 100644
> --- a/block/graph-lock.c
> +++ b/block/graph-lock.c
> @@ -163,17 +163,26 @@ void bdrv_graph_wrlock(BlockDriverState *bs)
>  void bdrv_graph_wrunlock(void)
>  {
>      GLOBAL_STATE_CODE();
> -    QEMU_LOCK_GUARD(&aio_context_list_lock);
>      assert(qatomic_read(&has_writer));
>  
> +    WITH_QEMU_LOCK_GUARD(&aio_context_list_lock) {
> +        /*
> +         * No need for memory barriers, this works in pair with
> +         * the slow path of rdlock() and both take the lock.
> +         */
> +        qatomic_store_release(&has_writer, 0);
> +
> +        /* Wake up all coroutine that are waiting to read the graph */

s/coroutine/coroutines/

> +        qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> +    }
> +
>      /*
> -     * No need for memory barriers, this works in pair with
> -     * the slow path of rdlock() and both take the lock.
> +     * Run any BHs that were scheduled during the wrlock section and that
> +     * callers might expect to have finished (e.g. bdrv_unref() calls). Do 
> this

Referring directly to bdrv_schedule_unref() would help make it clearer
what you mean.

> +     * only after restarting coroutines so that nested event loops in BHs 
> don't
> +     * deadlock if their condition relies on the coroutine making progress.
>       */
> -    qatomic_store_release(&has_writer, 0);
> -
> -    /* Wake up all coroutine that are waiting to read the graph */
> -    qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> +    aio_bh_poll(qemu_get_aio_context());

Keeping a dedicated list of BDSes pending unref seems cleaner than using
the aio_bh_poll(). There's a lot of stuff happening in the event loop
and I wonder if those other BHs might cause issues if they run at the
end of bdrv_graph_wrunlock(). The change to qemu-iotests 051's output is
an example of this, but other things could happen too (e.g. monitor
command processing).

>  }
>  
>  void coroutine_fn bdrv_graph_co_rdlock(void)
> diff --git a/tests/qemu-iotests/051.pc.out b/tests/qemu-iotests/051.pc.out
> index 4d4af5a486..7e10c5fa1b 100644
> --- a/tests/qemu-iotests/051.pc.out
> +++ b/tests/qemu-iotests/051.pc.out
> @@ -169,11 +169,11 @@ QEMU_PROG: -device scsi-hd,drive=disk: Device needs 
> media, but drive is empty
>  
>  Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object 
> iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 
> -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device 
> ide-hd,drive=disk,share-rw=on
>  QEMU X.Y.Z monitor - type 'help' for more information
> -QEMU_PROG: -device ide-hd,drive=disk,share-rw=on: Cannot change iothread of 
> active block backend
> +(qemu) QEMU_PROG: -device ide-hd,drive=disk,share-rw=on: Cannot change 
> iothread of active block backend
>  
>  Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object 
> iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 
> -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device 
> virtio-blk-pci,drive=disk,share-rw=on
>  QEMU X.Y.Z monitor - type 'help' for more information
> -QEMU_PROG: -device virtio-blk-pci,drive=disk,share-rw=on: Cannot change 
> iothread of active block backend
> +(qemu) QEMU_PROG: -device virtio-blk-pci,drive=disk,share-rw=on: Cannot 
> change iothread of active block backend
>  
>  Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object 
> iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 
> -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device 
> lsi53c895a,id=lsi0 -device scsi-hd,bus=lsi0.0,drive=disk,share-rw=on
>  QEMU X.Y.Z monitor - type 'help' for more information
> @@ -185,7 +185,7 @@ QEMU X.Y.Z monitor - type 'help' for more information
>  
>  Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object 
> iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 
> -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device 
> virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on
>  QEMU X.Y.Z monitor - type 'help' for more information
> -QEMU_PROG: -device virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on: 
> Cannot change iothread of active block backend
> +(qemu) QEMU_PROG: -device 
> virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on: Cannot change 
> iothread of active block backend
>  
>  Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object 
> iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 
> -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device 
> virtio-scsi,id=virtio-scsi1,iothread=thread0 -device 
> scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on
>  QEMU X.Y.Z monitor - type 'help' for more information
> -- 
> 2.41.0
> 

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]