[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] blk: postpone request execution on a context pr
Re: [Qemu-devel] [PATCH] blk: postpone request execution on a context protected with "drained section"
Fri, 14 Dec 2018 11:54:39 +0000
On 13.12.2018 15:20, Kevin Wolf wrote:
> Am 13.12.2018 um 12:07 hat Denis Plotnikov geschrieben:
>> On 12.12.2018 15:24, Kevin Wolf wrote:
>>> Am 11.12.2018 um 17:55 hat Denis Plotnikov geschrieben:
>>>>> Why involve the AioContext at all? This could all be kept at the
>>>>> BlockBackend level without extending the layering violation that
>>>>> aio_disable_external() is.
>>>>> BlockBackends get notified when their root node is drained, so hooking
>>>>> things up there should be as easy, if not even easier than in
>>>> Just want to make sure that I understood correctly what you meant by
>>>> "BlockBackends get notified". Did you mean that bdrv_drain_end calls
>>>> child's role callback blk_root_drained_end by calling
>>> Yes, blk_root_drained_begin/end calls are all you need. Specifically,
>>> their adjustments to blk->quiesce_counter that are already there, and in
>>> the 'if (--blk->quiesce_counter == 0)' block of blk_root_drained_end()
>>> we can resume the queued requests.
>> Sounds it should be so, but it doesn't work that way and that's why:
>> when doing mirror we may resume postponed coroutines too early when the
>> underlying bs is protected from writing at and thus we encounter the
>> assert on a write request execution at bdrv_co_write_req_prepare when
>> resuming the postponed coroutines.
>> The thing is that the bs is protected for writing before execution of
>> bdrv_replace_node at mirror_exit_common and bdrv_replace_node calls
>> bdrv_replace_child_noperm which, in turn, calls child->role->drained_end
>> where one of the callbacks is blk_root_drained_end which check
>> if(--blk->quiesce_counter == 0) and runs the postponed requests
>> (coroutines) if the coundition is true.
> Hm, so something is messed up with the drain sections in the mirror
> driver. We have:
> bdrv_replace_node(to_replace, target_bs, &local_err);
> Obviously, the intention was to keep the BlockBackend drained during
> bdrv_replace_node(). So how could blk->quiesce_counter ever get to 0
> inside bdrv_replace_node() when target_bs is drained?
> Looking at bdrv_replace_child_noperm(), it seems that the function has
> a bug: Even if old_bs and new_bs are both drained, the quiesce_counter
> for the parent reaches 0 for a moment because we call .drained_end for
> the old child first and .drained_begin for the new one later.
> So it seems the fix would be to reverse the order and first call
> .drained_begin for the new child and then .drained_end for the old
> child. Sounds like a good new testcase for tests/test-bdrv-drain.c, too.
Yes, it's true, but it's not enough...
In mirror_exit_common() we actively manipulate with block driver states.
When we replaced a node in the snippet you showed we can't allow the
postponed coroutines to run because the block tree isn't ready to
receive the requests yet.
To be ready, we need to insert a proper block driver state to the block
backend which is done here
blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort);
blk_insert_bs(bjob->blk, mirror_top_bs, &error_abort); << << << <<
bs_opaque->job = NULL;
If the tree isn't ready and we resume the coroutines, we'll end up with
the request landed in a wrong block driver state.
So, we explicitly should stop all activities on all the driver states
and its parents and allow the activities when everything is ready to go.
Why explicitly, because the block driver states may belong to different
block backends at the moment of the manipulation beginning.
So, it seems we need to disable all their contexts until the
Please, correct me if I'm wrong.
>> In seems that if the external requests disabled on the context we can't
>> rely on anything or should check where the underlying bs and its
>> underlying nodes are ready to receive requests which sounds quite
>> Please correct me if still don't understand something in that routine.
> I think the reason why reyling on aio_disable_external() works is simply
> because src is also drained, which keeps external events in the
> AioContext disabled despite the bug in draining the target node.
> The bug would become apparent even with aio_disable_external() if we
> didn't drain src, or even if we just supported src and target being in
> different AioContexts.
Why don't we disable all those contexts involved until the end of the
block device tree reconstruction?