qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->childr


From: Hanna Reitz
Subject: Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept
Date: Wed, 30 Mar 2022 12:53:15 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0

On 17.03.22 17:23, Emanuele Giuseppe Esposito wrote:

Am 09/03/2022 um 14:26 schrieb Emanuele Giuseppe Esposito:
* Drains allow the caller (either main loop or iothread running
the context) to wait all in_flights requests and operations
of a BDS: normal drains target a given node and is parents, while
subtree ones also include the subgraph of the node. Siblings are
not affected by any of these two kind of drains.
Siblings are drained to the extent required for their parent node to
reach in_flight == 0.

I haven't checked the code but I guess the case you're alluding to is
that siblings with multiple parents could have other I/O in flight that
will not be drained and further I/O can be submitted after the parent
has drained?
Yes, this in theory can happen. I don't really know if this happens
practically, and how likely is to happen.

The alternative would be to make a drain that blocks the whole graph,
siblings included, but that would probably be an overkill.

So I have thought about this, and I think maybe this is not a concrete
problem.
Suppose we have a graph where "parent" has 2 children: "child" and
"sibling". "sibling" also has a blockjob.

Now, main loop wants to modify parent-child relation and maybe detach
child from parent.

1st wrong assumption: the sibling is not drained. Actually my strategy
takes into account draining both nodes, also because parent could be in
another graph. Therefore sibling is drained.

But let's assume "sibling" is the sibling of the parent.

Therefore we have
"child" -> "parent" -> "grandparent"
and
"blockjob" -> "sibling" -> "grandparent"

The issue is the following: main loop can't drain "sibling", because
subtree_drained does not reach it. Therefore blockjob can still run
while main loop modifies "child" -> "parent". Blockjob can either:
1) drain, but this won't affect "child" -> "parent"
2) read the graph in other ways different from drain, for example
.set_aio_context recursively touches the whole graph.
3) write the graph.

I don’t really understand the problem here.  If the block job only operates on the sibling subgraph, why would it care what’s going on in the other subgraph?

Block jobs should own all nodes that are associated with them (e.g. because they intend to drop or replace them when the job is done), so when part of the graph is drained, all jobs that could modify that part should be drained, too.

3) can be only performed in the main loop, because it's a graph
operation. It means that the blockjob runs when the graph modifying
coroutine/bh is not running. They never run together.
The safety of this operation relies on where the drains are and will be
inserted. If you do like in my patch "block.c:
bdrv_replace_child_noperm: first call ->attach(), and then add child",
then we would have problem, because we drain between two writes, and the
blockjob will find an inconsistent graph. If we do it as we seem to do
it so far, then we won't really have any problem.

2) is a read, and can theoretically be performed by another thread. But
is there a function that does that? .set_aio_context for example is a GS
function, so we will fall back to case 3) and nothing bad would happen.

Is there a counter example for this?

-----------

Talking about something else, I discussed with Kevin what *seems* to be
an alternative way to do this, instead of adding drains everywhere.
His idea is to replicate what blk_wait_while_drained() currently does
but on a larger scale. It is something in between this subtree_drains
logic and a rwlock.

Basically if I understood correctly, we could implement
bdrv_wait_while_drained(), and put in all places where we would put a
read lock: all the reads to ->parents and ->children.
This function detects if the bdrv is under drain, and if so it will stop
and wait that the drain finishes (ie the graph modification).
On the other side, each write would just need to drain probably both
nodes (simple drain), to signal that we are modifying the graph. Once
bdrv_drained_begin() finishes, we are sure all coroutines are stopped.
Once bdrv_drained_end() finishes, we automatically let all coroutine
restart, and continue where they left off.

Seems a good compromise between drains and rwlock. What do you think?

Well, sounds complicated.  So I’m asking myself whether this would be noticeably better than just an RwLock for graph modifications, like the global lock Vladimir has proposed.

Hanna




reply via email to

[Prev in Thread] Current Thread [Next in Thread]