[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 2/5] test-bdrv-drain: avoid race with BH in IOThread drain
From: |
Eric Blake |
Subject: |
Re: [PATCH v3 2/5] test-bdrv-drain: avoid race with BH in IOThread drain test |
Date: |
Wed, 13 Sep 2023 11:08:54 -0500 |
User-agent: |
NeoMutt/20230517 |
On Tue, Sep 12, 2023 at 07:10:34PM -0400, Stefan Hajnoczi wrote:
> This patch fixes a race condition in test-bdrv-drain that is difficult
> to reproduce. test-bdrv-drain sometimes fails without an error message
> on the block pull request sent by Kevin Wolf on Sep 4, 2023. I was able
> to reproduce it locally and found that "block-backend: process I/O in
> the current AioContext" (in this patch series) is the first commit where
> it reproduces.
>
> I do not know why "block-backend: process I/O in the current AioContext"
> exposes this bug. It might be related to the fact that the test's preadv
> request runs in the main thread instead of IOThread a after my commit.
In reading the commit message before the impacted code, my first
thought was that you had a typo of an extra word (that is, something
to fix by s/a //), but reading further, a better fix would be calling
attention to the fact that you are referencing a specific named
thread, as in s/IOThread a/IOThread A/...
> That might simply change the timing of the test.
>
> Now on to the race condition in test-bdrv-drain. The main thread
> schedules a BH in IOThread a and then drains the BDS:
...and another spot with the same parse issue...
>
> aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, &data);
>
> /* The request is running on the IOThread a. Draining its block device
...but here you were quoting from the existing code base, which is
where I finally realized it was more than just your commit message.
> * will make sure that it has completed as far as the BDS is concerned,
> * but the drain in this thread can continue immediately after
> * bdrv_dec_in_flight() and aio_ret might be assigned only slightly
> * later. */
> do_drain_begin(drain_type, bs);
>
> If the BH completes before do_drain_begin() then there is nothing to
> worry about.
>
> If the BH invokes bdrv_flush() before do_drain_begin(), then
> do_drain_begin() waits for it to complete.
>
> The problematic case is when do_drain_begin() runs before the BH enters
> bdrv_flush(). Then do_drain_begin() misses the BH and the drain
> mechanism has failed in quiescing I/O.
>
> Fix this by incrementing the in_flight counter so that do_drain_begin()
> waits for test_iothread_main_thread_bh().
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> tests/unit/test-bdrv-drain.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
> index ccc453c29e..67a79aa3f0 100644
> --- a/tests/unit/test-bdrv-drain.c
> +++ b/tests/unit/test-bdrv-drain.c
> @@ -512,6 +512,7 @@ static void test_iothread_main_thread_bh(void *opaque)
> * executed during drain, otherwise this would deadlock. */
> aio_context_acquire(bdrv_get_aio_context(data->bs));
> bdrv_flush(data->bs);
> + bdrv_dec_in_flight(data->bs); /* incremented by test_iothread_common() */
> aio_context_release(bdrv_get_aio_context(data->bs));
> }
>
> @@ -583,6 +584,13 @@ static void test_iothread_common(enum drain_type
> drain_type, int drain_thread)
> aio_context_acquire(ctx_a);
> }
>
> + /*
> + * Increment in_flight so that do_drain_begin() waits for
> + * test_iothread_main_thread_bh(). This prevents the race between
> + * test_iothread_main_thread_bh() in IOThread a and do_drain_begin()
> in
> + * this thread. test_iothread_main_thread_bh() decrements in_flight.
> + */
> + bdrv_inc_in_flight(bs);
> aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, &data);
>
> /* The request is running on the IOThread a. Draining its block
> device
and indeed, your commit message is consistent with the current code's
naming convention. If you have reason to respin, a pre-req patch to
change the case before adding more references might be nice, but I
won't insist.
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
- [PATCH v3 0/5] block-backend: process I/O in the current AioContext, Stefan Hajnoczi, 2023/09/12
- [PATCH v3 1/5] block: remove AIOCBInfo->get_aio_context(), Stefan Hajnoczi, 2023/09/12
- [PATCH v3 2/5] test-bdrv-drain: avoid race with BH in IOThread drain test, Stefan Hajnoczi, 2023/09/12
- Re: [PATCH v3 2/5] test-bdrv-drain: avoid race with BH in IOThread drain test,
Eric Blake <=
- [PATCH v3 3/5] block-backend: process I/O in the current AioContext, Stefan Hajnoczi, 2023/09/12
- [PATCH v3 4/5] block-backend: process zoned requests in the current AioContext, Stefan Hajnoczi, 2023/09/12
- [PATCH v3 5/5] block-coroutine-wrapper: use qemu_get_current_aio_context(), Stefan Hajnoczi, 2023/09/12
- Re: [PATCH v3 0/5] block-backend: process I/O in the current AioContext, Kevin Wolf, 2023/09/15