qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 5/6] test-bdrv-drain.c: remove test_detach_by_parent_cb()


From: Emanuele Giuseppe Esposito
Subject: Re: [PATCH 5/6] test-bdrv-drain.c: remove test_detach_by_parent_cb()
Date: Mon, 14 Feb 2022 12:11:55 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0


On 11/02/2022 16:38, Kevin Wolf wrote:
> Am 08.02.2022 um 16:36 hat Emanuele Giuseppe Esposito geschrieben:
>> This test uses a callback of an I/O function (blk_aio_preadv)
>> to modify the graph, using bdrv_attach_child.
>> This is simply not allowed anymore. I/O cannot change the graph.
> 
> The callback of an I/O function isn't I/O, though. It is code _after_
> the I/O has completed. If this doesn't work any more, it feels like this
> is a bug.
> 
> I think it becomes a bit more obvious when you translate the AIO into
> the equivalent coroutine function:
> 
>     void coroutine_fn foo(...)
>     {
>         GLOBAL_STATE_CODE();
> 
>         blk_co_preadv(...);
>         detach_by_parent_aio_cb(...);
>     }
> 
> This is obviously correct code that must work. Calling an I/O function
> from a GS function is allowed, and of course the GS function may
> continue to do GS stuff after the I/O function returned.
> 
> (Actually, I'm not sure if it really works when blk is not in the main
> AioContext, but your API split patches claim that it does, so let's
> assume for the moment that this is already true :-))

Uhmm why does my split claims that it is? all blk_aio_* are IO_CODE.
Also, as you said, if blk is not in the main loop, the callback is not
GS either.

> 
>> Before "block/io.c: make bdrv_do_drained_begin_quiesce static
>> and introduce bdrv_drained_begin_no_poll", the test would simply
>> be at risk of failure, because if bdrv_replace_child_noperm()
>> (called to modify the graph) would call a drain,
>> then one callback of .drained_begin() is bdrv_do_drained_begin_quiesce,
>> that specifically asserts that we are not in a coroutine.
>>
>> Now that we fixed the behavior, the drain will invoke a bh in the
>> main loop, so we don't have such problem. However, this test is still
>> illegal and fails because we forbid graph changes from I/O paths.
>>
>> Once we add the required subtree_drains to protect
>> bdrv_replace_child_noperm(), the key problem in this test is in:
> 
> Probably a question for a different patch, but why is a subtree drain
> required instead of just a normal node drain? It feels like a bigger
> hammer than what is needed for replacing a single child.
> 
>> acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL);
>> /* Drain and check the expected result */
>> bdrv_subtree_drained_begin(parent_b);
>>
>> because the detach_by_parent_aio_cb calls detach_indirect_bh(), that
>> modifies the graph and is invoked during bdrv_subtree_drained_begin().
>> The call stack is the following:
>> 1. blk_aio_preadv() creates a coroutine, increments in_flight counter
>> and enters the coroutine running blk_aio_read_entry()
>> 2. blk_aio_read_entry() performs the read and then schedules a bh to
>>    complete (blk_aio_complete)
>> 3. at this point, subtree_drained_begin() kicks in and waits for all
>>    in_flight requests, polling
>> 4. polling allows the bh to be scheduled, so blk_aio_complete runs
>> 5. blk_aio_complete *first* invokes the callback
>>    (detach_by_parent_aio_cb) and then decrements the in_flight counter
>> 6. Here we have the problem: detach_by_parent_aio_cb modifies the graph,
>>    so both bdrv_unref_child() and bdrv_attach_child() will have
>>    subtree_drains inside. And this causes a deadlock, because the
>>    nested drain will wait for in_flight counter to go to zero, which
>>    is only happening once the drain itself finishes.
> 
> So the problem has nothing to do with detach_by_parent_aio_cb() being
> an I/O function in the sense of locking rules (which it isn't), but with
> calling a drain operation while having the in_flight counter increased.
> 
> In other words, an AIO callback like detach_by_parent_aio_cb() must
> never call drain - and it doesn't before your changes to
> bdrv_replace_child_noperm() break it. How confident are we that this
> only breaks tests and not real code?
I am not sure. From a quick look, the AIO callback is really used pretty
much everywhere. Maybe we should really find a way to asseert
GLOBAL_STATE_CODE and friends?

> 
> Part of the problem is probably that drain is serving two slightly
> different purposes inside the block layer (just make sure that nothing
> touches the node any more) and callers outside of the block layer (make
> sure that everything has completed and no more callbacks will be
> called). This bug sits exactly in the difference between those two
> concepts - we're waiting for more than we would have to wait for, and it
> causes a deadlock in this combination.
> 
> I guess it could be fixed if BlockBackend accounted for requests that
> are already completed, but their callback hasn't yet. blk_drain() would
> then have to wait for those requests, but blk_root_drained_poll()
> wouldn't because these requests don't affect the root node any more.
> 
>> Different story is test_detach_by_driver_cb(): in this case,
>> detach_by_parent_aio_cb() does not call detach_indirect_bh(),
>> but it is instead called as a bh running in the main loop by
>> detach_by_driver_cb_drained_begin(), the callback for
>> .drained_begin().
>>
>> This test was added in 231281ab42 and part of the series
>> "Drain fixes and cleanups, part 3"
>> https://lists.nongnu.org/archive/html/qemu-block/2018-05/msg01132.html
>> but as explained above I believe that it is not valid anymore, and
>> can be discarded.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> 
> I feel throwing tests away just because they don't pass any more is a
> bit too simple. :-)

Well if the test is doing something it is not supposed to do, why not :)

And anyways this was obviously discussed in IRC with Paolo and somebody
else, can't remember who.

Emanuele
> 
> Kevin
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]