qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->childr


From: Emanuele Giuseppe Esposito
Subject: Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept
Date: Wed, 30 Mar 2022 13:55:41 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0


Am 30/03/2022 um 12:53 schrieb Hanna Reitz:
> On 17.03.22 17:23, Emanuele Giuseppe Esposito wrote:
>>
>> Am 09/03/2022 um 14:26 schrieb Emanuele Giuseppe Esposito:
>>>>> * Drains allow the caller (either main loop or iothread running
>>>>> the context) to wait all in_flights requests and operations
>>>>> of a BDS: normal drains target a given node and is parents, while
>>>>> subtree ones also include the subgraph of the node. Siblings are
>>>>> not affected by any of these two kind of drains.
>>>> Siblings are drained to the extent required for their parent node to
>>>> reach in_flight == 0.
>>>>
>>>> I haven't checked the code but I guess the case you're alluding to is
>>>> that siblings with multiple parents could have other I/O in flight that
>>>> will not be drained and further I/O can be submitted after the parent
>>>> has drained?
>>> Yes, this in theory can happen. I don't really know if this happens
>>> practically, and how likely is to happen.
>>>
>>> The alternative would be to make a drain that blocks the whole graph,
>>> siblings included, but that would probably be an overkill.
>>>
>> So I have thought about this, and I think maybe this is not a concrete
>> problem.
>> Suppose we have a graph where "parent" has 2 children: "child" and
>> "sibling". "sibling" also has a blockjob.
>>
>> Now, main loop wants to modify parent-child relation and maybe detach
>> child from parent.
>>
>> 1st wrong assumption: the sibling is not drained. Actually my strategy
>> takes into account draining both nodes, also because parent could be in
>> another graph. Therefore sibling is drained.
>>
>> But let's assume "sibling" is the sibling of the parent.
>>
>> Therefore we have
>> "child" -> "parent" -> "grandparent"
>> and
>> "blockjob" -> "sibling" -> "grandparent"
>>
>> The issue is the following: main loop can't drain "sibling", because
>> subtree_drained does not reach it. Therefore blockjob can still run
>> while main loop modifies "child" -> "parent". Blockjob can either:
>> 1) drain, but this won't affect "child" -> "parent"
>> 2) read the graph in other ways different from drain, for example
>> .set_aio_context recursively touches the whole graph.
>> 3) write the graph.
> 
> I don’t really understand the problem here.  If the block job only
> operates on the sibling subgraph, why would it care what’s going on in
> the other subgraph?

We are talking about something that probably does not happen, but what
if it calls a callback similar to .set_aio_context that goes through the
whole graph?

Even though the first question is: is there such callback?

Second even more irrealistic case is when a job randomly looks for a bs
in another connectivity component and for example drains it.
Again probably impossible.

> Block jobs should own all nodes that are associated with them (e.g.
> because they intend to drop or replace them when the job is done), so
> when part of the graph is drained, all jobs that could modify that part
> should be drained, too.

What do you mean with "own"?

> 
>> 3) can be only performed in the main loop, because it's a graph
>> operation. It means that the blockjob runs when the graph modifying
>> coroutine/bh is not running. They never run together.
>> The safety of this operation relies on where the drains are and will be
>> inserted. If you do like in my patch "block.c:
>> bdrv_replace_child_noperm: first call ->attach(), and then add child",
>> then we would have problem, because we drain between two writes, and the
>> blockjob will find an inconsistent graph. If we do it as we seem to do
>> it so far, then we won't really have any problem.
>>
>> 2) is a read, and can theoretically be performed by another thread. But
>> is there a function that does that? .set_aio_context for example is a GS
>> function, so we will fall back to case 3) and nothing bad would happen.
>>
>> Is there a counter example for this?
>>
>> -----------
>>
>> Talking about something else, I discussed with Kevin what *seems* to be
>> an alternative way to do this, instead of adding drains everywhere.
>> His idea is to replicate what blk_wait_while_drained() currently does
>> but on a larger scale. It is something in between this subtree_drains
>> logic and a rwlock.
>>
>> Basically if I understood correctly, we could implement
>> bdrv_wait_while_drained(), and put in all places where we would put a
>> read lock: all the reads to ->parents and ->children.
>> This function detects if the bdrv is under drain, and if so it will stop
>> and wait that the drain finishes (ie the graph modification).
>> On the other side, each write would just need to drain probably both
>> nodes (simple drain), to signal that we are modifying the graph. Once
>> bdrv_drained_begin() finishes, we are sure all coroutines are stopped.
>> Once bdrv_drained_end() finishes, we automatically let all coroutine
>> restart, and continue where they left off.
>>
>> Seems a good compromise between drains and rwlock. What do you think?
> 
> Well, sounds complicated.  So I’m asking myself whether this would be
> noticeably better than just an RwLock for graph modifications, like the
> global lock Vladimir has proposed.

But the point is then: aren't we re-inventing an AioContext lock?
the lock will protect not only ->parents and ->child, but also other
bdrv fields that are concurrently read/written.

I don't know, it seems to me that there is a lot of uncertainty on which
way to take...

Emanuele




reply via email to

[Prev in Thread] Current Thread [Next in Thread]