[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 05/21] block: Introduce bdrv_schedule_unref()
From: |
Stefan Hajnoczi |
Subject: |
Re: [PATCH 05/21] block: Introduce bdrv_schedule_unref() |
Date: |
Tue, 22 Aug 2023 15:01:43 -0400 |
On Thu, Aug 17, 2023 at 02:50:04PM +0200, Kevin Wolf wrote:
> bdrv_unref() is called by a lot of places that need to hold the graph
> lock (it naturally happens in the context of operations that change the
> graph). However, bdrv_unref() takes the graph writer lock internally, so
> it can't actually be called while already holding a graph lock without
> causing a deadlock.
>
> bdrv_unref() also can't just become GRAPH_WRLOCK because it drains the
> node before closing it, and draining requires that the graph is
> unlocked.
>
> The solution is to defer deleting the node until we don't hold the lock
> any more and draining is possible again.
>
> Note that keeping images open for longer than necessary can create
> problems, too: You can't open an image again before it is really closed
> (if image locking didn't prevent it, it would cause corruption).
> Reopening an image immediately happens at least during bdrv_open() and
> bdrv_co_create().
>
> In order to solve this problem, make sure to run the deferred unref in
> bdrv_graph_wrunlock(), i.e. the first possible place where we can drain
> again. This is also why bdrv_schedule_unref() is marked GRAPH_WRLOCK.
>
> The output of iotest 051 is updated because the additional polling
> changes the order of HMP output, resulting in a new "(qemu)" prompt in
> the test output that was previously on a separate line and filtered out.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> include/block/block-global-state.h | 1 +
> block.c | 9 +++++++++
> block/graph-lock.c | 23 ++++++++++++++++-------
> tests/qemu-iotests/051.pc.out | 6 +++---
> 4 files changed, 29 insertions(+), 10 deletions(-)
>
> diff --git a/include/block/block-global-state.h
> b/include/block/block-global-state.h
> index f347199bff..e570799f85 100644
> --- a/include/block/block-global-state.h
> +++ b/include/block/block-global-state.h
> @@ -224,6 +224,7 @@ void bdrv_img_create(const char *filename, const char
> *fmt,
> void bdrv_ref(BlockDriverState *bs);
> void no_coroutine_fn bdrv_unref(BlockDriverState *bs);
> void coroutine_fn no_co_wrapper bdrv_co_unref(BlockDriverState *bs);
> +void GRAPH_WRLOCK bdrv_schedule_unref(BlockDriverState *bs);
> void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
> BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
> BlockDriverState *child_bs,
> diff --git a/block.c b/block.c
> index 6376452768..9c4f24f4b9 100644
> --- a/block.c
> +++ b/block.c
> @@ -7033,6 +7033,15 @@ void bdrv_unref(BlockDriverState *bs)
> }
> }
>
> +void bdrv_schedule_unref(BlockDriverState *bs)
Please add a doc comment explaining when and why this should be used.
> +{
> + if (!bs) {
> + return;
> + }
> + aio_bh_schedule_oneshot(qemu_get_aio_context(),
> + (QEMUBHFunc *) bdrv_unref, bs);
> +}
> +
> struct BdrvOpBlocker {
> Error *reason;
> QLIST_ENTRY(BdrvOpBlocker) list;
> diff --git a/block/graph-lock.c b/block/graph-lock.c
> index 5e66f01ae8..395d387651 100644
> --- a/block/graph-lock.c
> +++ b/block/graph-lock.c
> @@ -163,17 +163,26 @@ void bdrv_graph_wrlock(BlockDriverState *bs)
> void bdrv_graph_wrunlock(void)
> {
> GLOBAL_STATE_CODE();
> - QEMU_LOCK_GUARD(&aio_context_list_lock);
> assert(qatomic_read(&has_writer));
>
> + WITH_QEMU_LOCK_GUARD(&aio_context_list_lock) {
> + /*
> + * No need for memory barriers, this works in pair with
> + * the slow path of rdlock() and both take the lock.
> + */
> + qatomic_store_release(&has_writer, 0);
> +
> + /* Wake up all coroutine that are waiting to read the graph */
s/coroutine/coroutines/
> + qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> + }
> +
> /*
> - * No need for memory barriers, this works in pair with
> - * the slow path of rdlock() and both take the lock.
> + * Run any BHs that were scheduled during the wrlock section and that
> + * callers might expect to have finished (e.g. bdrv_unref() calls). Do
> this
Referring directly to bdrv_schedule_unref() would help make it clearer
what you mean.
> + * only after restarting coroutines so that nested event loops in BHs
> don't
> + * deadlock if their condition relies on the coroutine making progress.
> */
> - qatomic_store_release(&has_writer, 0);
> -
> - /* Wake up all coroutine that are waiting to read the graph */
> - qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> + aio_bh_poll(qemu_get_aio_context());
Keeping a dedicated list of BDSes pending unref seems cleaner than using
the aio_bh_poll(). There's a lot of stuff happening in the event loop
and I wonder if those other BHs might cause issues if they run at the
end of bdrv_graph_wrunlock(). The change to qemu-iotests 051's output is
an example of this, but other things could happen too (e.g. monitor
command processing).
> }
>
> void coroutine_fn bdrv_graph_co_rdlock(void)
> diff --git a/tests/qemu-iotests/051.pc.out b/tests/qemu-iotests/051.pc.out
> index 4d4af5a486..7e10c5fa1b 100644
> --- a/tests/qemu-iotests/051.pc.out
> +++ b/tests/qemu-iotests/051.pc.out
> @@ -169,11 +169,11 @@ QEMU_PROG: -device scsi-hd,drive=disk: Device needs
> media, but drive is empty
>
> Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object
> iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0
> -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device
> ide-hd,drive=disk,share-rw=on
> QEMU X.Y.Z monitor - type 'help' for more information
> -QEMU_PROG: -device ide-hd,drive=disk,share-rw=on: Cannot change iothread of
> active block backend
> +(qemu) QEMU_PROG: -device ide-hd,drive=disk,share-rw=on: Cannot change
> iothread of active block backend
>
> Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object
> iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0
> -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device
> virtio-blk-pci,drive=disk,share-rw=on
> QEMU X.Y.Z monitor - type 'help' for more information
> -QEMU_PROG: -device virtio-blk-pci,drive=disk,share-rw=on: Cannot change
> iothread of active block backend
> +(qemu) QEMU_PROG: -device virtio-blk-pci,drive=disk,share-rw=on: Cannot
> change iothread of active block backend
>
> Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object
> iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0
> -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device
> lsi53c895a,id=lsi0 -device scsi-hd,bus=lsi0.0,drive=disk,share-rw=on
> QEMU X.Y.Z monitor - type 'help' for more information
> @@ -185,7 +185,7 @@ QEMU X.Y.Z monitor - type 'help' for more information
>
> Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object
> iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0
> -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device
> virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on
> QEMU X.Y.Z monitor - type 'help' for more information
> -QEMU_PROG: -device virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on:
> Cannot change iothread of active block backend
> +(qemu) QEMU_PROG: -device
> virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on: Cannot change
> iothread of active block backend
>
> Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object
> iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0
> -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device
> virtio-scsi,id=virtio-scsi1,iothread=thread0 -device
> scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on
> QEMU X.Y.Z monitor - type 'help' for more information
> --
> 2.41.0
>
signature.asc
Description: PGP signature