[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] PING: [PATCH] blk: postpone request execution on a context
From: |
Denis Plotnikov |
Subject: |
[Qemu-devel] PING: [PATCH] blk: postpone request execution on a context protected with "drained section" |
Date: |
Thu, 17 Jan 2019 12:57:45 +0000 |
Kevin,
could you please take a look at my last comments?
Thanks!
Denis
On 15.01.2019 10:22, Denis Plotnikov wrote:
> ping ping ping ping!!!!
>
> On 09.01.2019 11:18, Denis Plotnikov wrote:
>> ping ping!!!
>>
>> On 18.12.2018 11:53, Denis Plotnikov wrote:
>>> ping ping
>>>
>>> On 14.12.2018 14:54, Denis Plotnikov wrote:
>>>>
>>>>
>>>> On 13.12.2018 15:20, Kevin Wolf wrote:
>>>>> Am 13.12.2018 um 12:07 hat Denis Plotnikov geschrieben:
>>>>>> On 12.12.2018 15:24, Kevin Wolf wrote:
>>>>>>> Am 11.12.2018 um 17:55 hat Denis Plotnikov geschrieben:
>>>>>>>>> Why involve the AioContext at all? This could all be kept at the
>>>>>>>>> BlockBackend level without extending the layering violation that
>>>>>>>>> aio_disable_external() is.
>>>>>>>>>
>>>>>>>>> BlockBackends get notified when their root node is drained, so
>>>>>>>>> hooking
>>>>>>>>> things up there should be as easy, if not even easier than in
>>>>>>>>> AioContext.
>>>>>>>>
>>>>>>>> Just want to make sure that I understood correctly what you
>>>>>>>> meant by
>>>>>>>> "BlockBackends get notified". Did you mean that bdrv_drain_end
>>>>>>>> calls
>>>>>>>> child's role callback blk_root_drained_end by calling
>>>>>>>> bdrv_parent_drained_end?
>>>>>>>
>>>>>>> Yes, blk_root_drained_begin/end calls are all you need.
>>>>>>> Specifically,
>>>>>>> their adjustments to blk->quiesce_counter that are already there,
>>>>>>> and in
>>>>>>> the 'if (--blk->quiesce_counter == 0)' block of
>>>>>>> blk_root_drained_end()
>>>>>>> we can resume the queued requests.
>>>>>> Sounds it should be so, but it doesn't work that way and that's why:
>>>>>> when doing mirror we may resume postponed coroutines too early
>>>>>> when the
>>>>>> underlying bs is protected from writing at and thus we encounter the
>>>>>> assert on a write request execution at bdrv_co_write_req_prepare when
>>>>>> resuming the postponed coroutines.
>>>>>>
>>>>>> The thing is that the bs is protected for writing before execution of
>>>>>> bdrv_replace_node at mirror_exit_common and bdrv_replace_node calls
>>>>>> bdrv_replace_child_noperm which, in turn, calls
>>>>>> child->role->drained_end
>>>>>> where one of the callbacks is blk_root_drained_end which check
>>>>>> if(--blk->quiesce_counter == 0) and runs the postponed requests
>>>>>> (coroutines) if the coundition is true.
>>>>>
>>>>> Hm, so something is messed up with the drain sections in the mirror
>>>>> driver. We have:
>>>>>
>>>>> bdrv_drained_begin(target_bs);
>>>>> bdrv_replace_node(to_replace, target_bs, &local_err);
>>>>> bdrv_drained_end(target_bs);
>>>>>
>>>>> Obviously, the intention was to keep the BlockBackend drained during
>>>>> bdrv_replace_node(). So how could blk->quiesce_counter ever get to 0
>>>>> inside bdrv_replace_node() when target_bs is drained?
>>>>>
>>>>> Looking at bdrv_replace_child_noperm(), it seems that the function has
>>>>> a bug: Even if old_bs and new_bs are both drained, the quiesce_counter
>>>>> for the parent reaches 0 for a moment because we call .drained_end for
>>>>> the old child first and .drained_begin for the new one later.
>>>>>
>>>>> So it seems the fix would be to reverse the order and first call
>>>>> .drained_begin for the new child and then .drained_end for the old
>>>>> child. Sounds like a good new testcase for tests/test-bdrv-drain.c,
>>>>> too.
>>>> Yes, it's true, but it's not enough...
>>>> In mirror_exit_common() we actively manipulate with block driver
>>>> states.
>>>> When we replaced a node in the snippet you showed we can't allow the
>>>> postponed coroutines to run because the block tree isn't ready to
>>>> receive the requests yet.
>>>> To be ready, we need to insert a proper block driver state to the
>>>> block backend which is done here
>>>>
>>>> blk_remove_bs(bjob->blk);
>>>> blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort);
>>>> blk_insert_bs(bjob->blk, mirror_top_bs, &error_abort); << << << <<
>>>>
>>>> bs_opaque->job = NULL;
>>>>
>>>> bdrv_drained_end(src);
>>>>
>>>> If the tree isn't ready and we resume the coroutines, we'll end up
>>>> with the request landed in a wrong block driver state.
>>>>
>>>> So, we explicitly should stop all activities on all the driver states
>>>> and its parents and allow the activities when everything is ready to
>>>> go.
>>>>
>>>> Why explicitly, because the block driver states may belong to
>>>> different block backends at the moment of the manipulation beginning.
>>>>
>>>> So, it seems we need to disable all their contexts until the
>>>> manipulation ends.
>>>>
>>>> Please, correct me if I'm wrong.
>>>>
>>>>>
>>>>>> In seems that if the external requests disabled on the context we
>>>>>> can't
>>>>>> rely on anything or should check where the underlying bs and its
>>>>>> underlying nodes are ready to receive requests which sounds quite
>>>>>> complicated.
>>>>>> Please correct me if still don't understand something in that
>>>>>> routine.
>>>>>
>>>>> I think the reason why reyling on aio_disable_external() works is
>>>>> simply
>>>>> because src is also drained, which keeps external events in the
>>>>> AioContext disabled despite the bug in draining the target node.
>>>>>
>>>>> The bug would become apparent even with aio_disable_external() if we
>>>>> didn't drain src, or even if we just supported src and target being in
>>>>> different AioContexts.
>>>>
>>>> Why don't we disable all those contexts involved until the end of
>>>> the block device tree reconstruction?
>>>>
>>>> Thanks!
>>>>
>>>> Denis
>>>>>
>>>>> Kevin
>>>>>
>>>>
>>>
>>
>
--
Best,
Denis