qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] block: bdrv_set_backing_hd(): use drained section


From: Emanuele Giuseppe Esposito
Subject: Re: [PATCH] block: bdrv_set_backing_hd(): use drained section
Date: Fri, 28 Jan 2022 15:12:11 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0


On 27/01/2022 15:13, Kevin Wolf wrote:
> Am 25.01.2022 um 11:12 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 25.01.2022 12:24, Paolo Bonzini wrote:
>>> On 1/24/22 18:37, Vladimir Sementsov-Ogievskiy wrote:
>>>> Graph modifications should be done in drained section. stream_prepare()
>>>> handler of block stream job call bdrv_set_backing_hd() without using
>>>> drained section and it's theoretically possible that some IO request
>>>> will interleave with graph modification and will use outdated pointers
>>>> to removed block nodes.
>>>>
>>>> Some other callers use bdrv_set_backing_hd() not caring about drained
>>>> sections too. So it seems good to make a drained section exactly in
>>>> bdrv_set_backing_hd().
>>>
>>> Emanuele has a similar patch in his series to protect all graph
>>> modifications with drains:
>>>
>>> @@ -3456,6 +3478,11 @@ int bdrv_set_backing_hd(BlockDriverState *bs, 
>>> BlockDriverState *backing_hd,
>>>
>>>       assert(qemu_in_main_thread());
>>>
>>> +    bdrv_subtree_drained_begin_unlocked(bs);
>>> +    if (backing_hd) {
>>> +        bdrv_subtree_drained_begin_unlocked(backing_hd);
>>> +    }
>>> +
>>>       ret = bdrv_set_backing_noperm(bs, backing_hd, tran, errp);
>>>       if (ret < 0) {
>>>           goto out;
>>> @@ -3464,6 +3491,10 @@ int bdrv_set_backing_hd(BlockDriverState *bs, 
>>> BlockDriverState *backing_hd,
>>>       ret = bdrv_refresh_perms(bs, errp);
>>>   out:
>>>       tran_finalize(tran, ret);
>>> +    if (backing_hd) {
>>> +        bdrv_subtree_drained_end_unlocked(backing_hd);
>>> +    }
>>> +    bdrv_subtree_drained_end_unlocked(bs);
>>>
>>>       return ret;
>>>   }
>>>
>>> so the idea at least is correct.
>>>
>>> I don't object to fixing this independently, but please check
>>> 1) if a subtree drain would be more appropriate, 2) whether
>>> backing_hd should be drained as well, 3) whether we're guaranteed
>>> to be holding the AioContext lock as required for
>>> bdrv_drained_begin/end.
>>>
>>
>> Hmm.
>>
>> 1. Subtree draining of backing_hd will not help, as bs is not drained,
>> we still may have in-fight request in bs, touching old bs->backing.

What do you mean bs is not drained? In my patch I drain both.

>>
>> 2. I think non-recursive drain of bs is enough. We modify only bs
>> node, so we should drain it. backing_hd itself is not modified. If
>> backing_hd participate in some other backing chain - it's not touched,
>> and in-flight requests in that other chain are not broken by
>> modification, so why to drain it? Same for old bs->backing and other
>> bs children. We are not interested in in-flight requests in subtree
>> which are not part of request in bs. So, if no inflight requests in
>> bs, we can modify bs and not care about requests in subtree.
> 
> I agree on both points. Emanuele's patch seems to be doing unnecessary
> work there.

Wait, the point of my patches[*] is to protect
bdrv_replace_child_noperm(). See the cover letter of my series for more
info.

The reason for a subtree drain is that one callback of .attach() in
bdrv_replace_child_noperm() is bdrv_child_cb_attach().
This attaches the node in child->opaque->children list, so both nodes
pointed by the BdrvChild are modified (child->bs and child->opaque).
Simply draining on child->bs won't be enough to also get child->opaque
in my opinion[*].
Same applies with detach.
One interesting side note is what happens if we are moving the child
from one bs to another (old_bs and new_bs are both != NULL):
child->opaque will just lose and re-gain the same child.

Regarding this specific drain: I am missing something for sure here,
because if I try to follow the code I see that from
bdrv_set_backing_hd(bs, backing_hd)
the call stack eventually ends up to
bdrv_replace_child_noperm(child, new_bs /* -> backing_hd */)
and then the graph modification there is:
QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);

So why not protecting backing_hd?

Full stack:
bdrv_set_backing_hd(bs, backing_hd)
 bdrv_set_backing_noperm(bs, backing_hd)
  bdrv_set_file_or_backing_noperm(bs, backing_hd)
    bdrv_attach_child_noperm(parent_bs = bs, child_bs = backing_hd)
     bdrv_attach_child_common(child_bs = backing_hd, .., parent_bs = bs)
      new_child.bs = NULL;
      new_child.opaque = parent_bs = bs;
      bdrv_replace_child_noperm(new_child, child_bs = backing_hd)

[*] = BTW, I see that you understand this stuff way deeper than I do, so
feel free to review my drained-related series if you have time :)

> 
>> 3. Jobs are bound to aio context, so I believe that they care to hold
>> AioContext lock. For example, on path job_prepare may be called
>> through job_exit(), job_exit() does
>> aio_context_acquire(job->aio_context), or it may be called through
>> job_cancel(), which seems to be called under aio_context_acquire() as
>> well. So, seems in general we care about it, and of course
>> bdrv_set_backing_hd() must be called with AioContext lock held. If for
>> some code path it isn't, it's a bug..
> 
> We do have some code that does exactly that: In the main thread, we
> often don't hold the AioContext lock, but only the BQL. I find it quite
> ugly, but it works as long as the node is in the main AioContext.
> 
> One path where this is relevant is bdrv_open_inherit() ->
> bdrv_open_backing_file() -> bdrv_set_backing_hd(). This one is harmless
> because we know that we just created the new node in the main
> AioContext.
> 
> All the other paths seem to come either from jobs (which take the
> AioContext as you explained) or directly from monitor commands, which I
> just checked to take the lock as well.
> 

This won't hold anymore when the job patches are applied. So that is why
in my case subtree_drained_begin/end_unlocked() works.

Thank you,
Emanuele




reply via email to

[Prev in Thread] Current Thread [Next in Thread]