qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/3] block: add BDS field to count in-flight req


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH 1/3] block: add BDS field to count in-flight requests
Date: Tue, 11 Oct 2016 16:09:13 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0

First of all, a correction:

>> The exception is that there are places
>> where we don't have a BlockBackend and thus call
>> bdrv_drain/bdrv_co_drain instead of blk_drain/blk_co_drain

Nevermind---it's just that there is no blk_drained_begin/end yet.

On 11/10/2016 13:00, Kevin Wolf wrote:
> Actually, does bdrv_set_aio_context really need bdrv_drain, or should it
> be using a begin/end pair, too?

Hmm, yes, it should use a pair.

> I was wondering if we need separate .bdrv_isolate_begin/end callbacks,
> but I think on this level there is actually no difference: Both tell the
> block driver to stop sending requests.

Yup, the difference is in whether you exempt one BlockBackend.  If you
do, that BB is isolated.  If you don't, the BDS is quiesced (or drained).

> Anyway, let's see which of the existing bdrv_drained_begin/end users
> would use this (please correct):
> 
> * Block jobs use during completion
>
> * The QMP transaction commands to start block jobs drain as well, but
>   they don't have a BlockBackend yet. Those would call a BDS-level
>   bdrv_isolate_begin/end then, right?
> 
> * Quorum wants to isolate itself from new parent requests while adding a
>   new child, that's another user for bdrv_isolate_begin/end
> 
> * bdrv_snapshot_delete(), probably the save BDS-level thing

Is what you call "a BDS-level bdrv_isolate_begin/end" the same as my
"bdrv_drained_begin(bs) and bdrv_drained_end(bs), which quiesce all root
BdrvChildren reachable from bs"?  Anyway I think we agree.

> Okay, I think we agree on the theory.
> 
> Now I'm curious how this relates to the layering violation in this
> specific patch.

It doesn't, but knowing that we want to go to the same place helps.
Especially since you said I was permanently preventing getting there. :)

>> blk_drain and blk_co_drain can simply count in-flight requests, exactly
>> as done in patch 1.  Sure, I'm adding it at the wrong layer because not
>> every user of the block layer has already gotten its own personal BB.
> 
> Really? Where is it still missing? I was pretty sure I had converted all
> users.

I mentioned block jobs, block migration and the NBD server.  They do use
a BB (as you said, they wouldn't compile after the BdrvChild
conversion), but they don't have their own *separate* BB as far as I can
tell, with separate throttling state etc.  How far are we from that?

> As for .bdrv_inactivate/invalidate, I agree that it should ideally
> imply .bdrv_drained_begin/end, though in practice we don't assert
> cleared BDRV_O_INACTIVE on reads, so I think it might not hold true.
> 
> I'm not so sure if .bdrv_drained_begin/end should imply inactivation,
> though.

Again, bad naming hurts...  See above where you said that bdrv_drain
should be called from bdrv_close only, and not be recursive.  And indeed
in qcow2_close you see:

    if (!(s->flags & BDRV_O_INACTIVE)) {
        qcow2_inactivate(bs);
    }

so this in bdrv_close:

    bdrv_flush(bs);
    bdrv_drain(bs); /* in case flush left pending I/O */

should be replaced by a non-recursive bdrv_inactivate, and bdrv_flush
should be the default implementation of bdrv_inactivate.  The QED code
added in patch 3 can also become its .bdrv_inactivate callback.  In
addition, stopping the VM could also do a "full flush" without setting
BDRV_O_INACTIVE instead of using bdrv_flush_all.

> Why don't we start with adding a proper blk_drain so that we don't have
> to introduce the ugly intermediate state? This would mostly mean
> splitting the throttling drain into one version called by blk_drain that
> restarts all requests while ignoring the throttling, and changing the
> BdrvChild callback to only stop sending them. Once you're there, I think
> that gets you rid of the BDS request count manipulations from the BB
> layer because the BDS layer already counts all requests.

I guess that would be doable.  However I don't think it's so easy.  I
also don't think it's very interesting (that's probably where we disagree).

First of all, the current implementation of bdrv_drain is nasty,
especially the double aio_poll and the recursive bdrv_requests_pending
are hard to follow.  My aim for these two patches was to have an
implementation of bdrv_drain that is more easily understandable, so that
you can _then_ move things to the correct layer.  So even if I
implemented a series to add a proper blk_drain, I would start with these
two patches and then move things up.

Second, there's the issue of bdrv_drained_begin/end, which is the main
reason why I placed the request count at the BDS level.  BlockBackend
isolation is a requirement before you can count requests exclusively at
the BB level.  At which point, implementing isolated sections properly
(including migration from aio_disable/enable_external to two new
BlockDevOps, and giving NBD+jobs+migration a separate BB) is more
interesting than cobbling together the minimum that's enough to
eliminate bs->in_flight.

All in all I don't think this should be done as a single patch series
and certainly wouldn't be ready in time for 2.8.  These two patches
(plus the blockjob_drain one that I posted) are needed for me to get rid
of RFifoLock and fix bdrv_drain_all deadlocks.  I'd really love to do
that in 2.8, and the patches for that are ready to be posted (as RFC I
guess).

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]