[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] PING: [PATCH] blk: postpone request execution on a cont

From: Denis Plotnikov
Subject: Re: [Qemu-block] PING: [PATCH] blk: postpone request execution on a context protected with "drained section"
Date: Fri, 18 Jan 2019 07:43:22 +0000

On 17.01.2019 17:23, Kevin Wolf wrote:
> Am 17.01.2019 um 13:57 hat Denis Plotnikov geschrieben:
>> Kevin,
>> could you please take a look at my last comments?
> I read it, and what it told me is essentially that I need to work on it
> myself to fully understand the problem and possible acceptable solutions
> because you can't seem to find one yourself. I will, but I can't
> guarantee when I can find the time for it.
> Kevin
ok. Thanks!

>> On 15.01.2019 10:22, Denis Plotnikov wrote:
>>> ping ping ping ping!!!!
>>> On 09.01.2019 11:18, Denis Plotnikov wrote:
>>>> ping ping!!!
>>>> On 18.12.2018 11:53, Denis Plotnikov wrote:
>>>>> ping ping
>>>>> On 14.12.2018 14:54, Denis Plotnikov wrote:
>>>>>> On 13.12.2018 15:20, Kevin Wolf wrote:
>>>>>>> Am 13.12.2018 um 12:07 hat Denis Plotnikov geschrieben:
>>>>>>>> On 12.12.2018 15:24, Kevin Wolf wrote:
>>>>>>>>> Am 11.12.2018 um 17:55 hat Denis Plotnikov geschrieben:
>>>>>>>>>>> Why involve the AioContext at all? This could all be kept at the
>>>>>>>>>>> BlockBackend level without extending the layering violation that
>>>>>>>>>>> aio_disable_external() is.
>>>>>>>>>>> BlockBackends get notified when their root node is drained, so
>>>>>>>>>>> hooking
>>>>>>>>>>> things up there should be as easy, if not even easier than in
>>>>>>>>>>> AioContext.
>>>>>>>>>> Just want to make sure that I understood correctly what you
>>>>>>>>>> meant by
>>>>>>>>>> "BlockBackends get notified". Did you mean that bdrv_drain_end
>>>>>>>>>> calls
>>>>>>>>>> child's role callback blk_root_drained_end by calling
>>>>>>>>>> bdrv_parent_drained_end?
>>>>>>>>> Yes, blk_root_drained_begin/end calls are all you need.
>>>>>>>>> Specifically,
>>>>>>>>> their adjustments to blk->quiesce_counter that are already there,
>>>>>>>>> and in
>>>>>>>>> the 'if (--blk->quiesce_counter == 0)' block of
>>>>>>>>> blk_root_drained_end()
>>>>>>>>> we can resume the queued requests.
>>>>>>>> Sounds it should be so, but it doesn't work that way and that's why:
>>>>>>>> when doing mirror we may resume postponed coroutines too early
>>>>>>>> when the
>>>>>>>> underlying bs is protected from writing at and thus we encounter the
>>>>>>>> assert on a write request execution at bdrv_co_write_req_prepare when
>>>>>>>> resuming the postponed coroutines.
>>>>>>>> The thing is that the bs is protected for writing before execution of
>>>>>>>> bdrv_replace_node at mirror_exit_common and bdrv_replace_node calls
>>>>>>>> bdrv_replace_child_noperm which, in turn, calls
>>>>>>>> child->role->drained_end
>>>>>>>> where one of the callbacks is blk_root_drained_end which check
>>>>>>>> if(--blk->quiesce_counter == 0) and runs the postponed requests
>>>>>>>> (coroutines) if the coundition is true.
>>>>>>> Hm, so something is messed up with the drain sections in the mirror
>>>>>>> driver. We have:
>>>>>>>       bdrv_drained_begin(target_bs);
>>>>>>>       bdrv_replace_node(to_replace, target_bs, &local_err);
>>>>>>>       bdrv_drained_end(target_bs);
>>>>>>> Obviously, the intention was to keep the BlockBackend drained during
>>>>>>> bdrv_replace_node(). So how could blk->quiesce_counter ever get to 0
>>>>>>> inside bdrv_replace_node() when target_bs is drained?
>>>>>>> Looking at bdrv_replace_child_noperm(), it seems that the function has
>>>>>>> a bug: Even if old_bs and new_bs are both drained, the quiesce_counter
>>>>>>> for the parent reaches 0 for a moment because we call .drained_end for
>>>>>>> the old child first and .drained_begin for the new one later.
>>>>>>> So it seems the fix would be to reverse the order and first call
>>>>>>> .drained_begin for the new child and then .drained_end for the old
>>>>>>> child. Sounds like a good new testcase for tests/test-bdrv-drain.c,
>>>>>>> too.
>>>>>> Yes, it's true, but it's not enough...
>>>>>> In mirror_exit_common() we actively manipulate with block driver
>>>>>> states.
>>>>>> When we replaced a node in the snippet you showed we can't allow the
>>>>>> postponed coroutines to run because the block tree isn't ready to
>>>>>> receive the requests yet.
>>>>>> To be ready, we need to insert a proper block driver state to the
>>>>>> block backend which is done here
>>>>>>       blk_remove_bs(bjob->blk);
>>>>>>       blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort);
>>>>>>       blk_insert_bs(bjob->blk, mirror_top_bs, &error_abort); << << << <<
>>>>>>       bs_opaque->job = NULL;
>>>>>>       bdrv_drained_end(src);
>>>>>> If the tree isn't ready and we resume the coroutines, we'll end up
>>>>>> with the request landed in a wrong block driver state.
>>>>>> So, we explicitly should stop all activities on all the driver states
>>>>>> and its parents and allow the activities when everything is ready to
>>>>>> go.
>>>>>> Why explicitly, because the block driver states may belong to
>>>>>> different block backends at the moment of the manipulation beginning.
>>>>>> So, it seems we need to disable all their contexts until the
>>>>>> manipulation ends.
>>>>>> Please, correct me if I'm wrong.
>>>>>>>> In seems that if the external requests disabled on the context we
>>>>>>>> can't
>>>>>>>> rely on anything or should check where the underlying bs and its
>>>>>>>> underlying nodes are ready to receive requests which sounds quite
>>>>>>>> complicated.
>>>>>>>> Please correct me if still don't understand something in that
>>>>>>>> routine.
>>>>>>> I think the reason why reyling on aio_disable_external() works is
>>>>>>> simply
>>>>>>> because src is also drained, which keeps external events in the
>>>>>>> AioContext disabled despite the bug in draining the target node.
>>>>>>> The bug would become apparent even with aio_disable_external() if we
>>>>>>> didn't drain src, or even if we just supported src and target being in
>>>>>>> different AioContexts.
>>>>>> Why don't we disable all those contexts involved until the end of
>>>>>> the block device tree reconstruction?
>>>>>> Thanks!
>>>>>> Denis
>>>>>>> Kevin
>> -- 
>> Best,
>> Denis


reply via email to

[Prev in Thread] Current Thread [Next in Thread]