qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-block] [PATCH v2 11/17] block-backend: Decrease i


From: Kevin Wolf
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH v2 11/17] block-backend: Decrease in_flight only after callback
Date: Tue, 18 Sep 2018 13:34:34 +0200
User-agent: Mutt/1.9.1 (2017-09-22)

Am 17.09.2018 um 19:08 hat Paolo Bonzini geschrieben:
> On 17/09/2018 18:51, Kevin Wolf wrote:
> > Am 17.09.2018 um 17:59 hat Paolo Bonzini geschrieben:
> >> On 17/09/2018 14:53, Kevin Wolf wrote:
> >>>>> I think I can drop the ref/unref pair, but not the whole patch (whose
> >>>>> main point is reordering dec_in_flight vs. the AIO callback).
> >>>>
> >>>> You're right, though I think I did that on purpose back in the day.
> >>>> IIRC it was related to bdrv_drain, which might never complete if called
> >>>> from an AIO callback.
> >>>
> >>> Hm... This seems to become a common pattern, it's the same as for the
> >>> job completion callbacks (only improved enough for the bug at hand to
> >>> disappear instead of properly fixed in "blockjob: Lie better in
> >>> child_job_drained_poll()").
> >>>
> >>> Either you say there is no activity even though there is still a
> >>> callback pending, then bdrv_drain() called from elsewhere will return
> >>> too early and we get a bug. Or you say there is activity, then any
> >>> nested drain inside that callback will deadlock and we get a bug, too.
> >>>
> >>> So I suppose we need some way to know which activities to ignore during
> >>> drain, depending on who is the caller? :-/
> >>
> >> Some alternatives:
> >>
> >> 1) anything that needs to do invoke I/O for callbacks must use inc/dec
> >> in flight manually.  Simplest but hard to assert though.  At least SCSI
> >> IDE are broken.
> >>
> >> 2) callbacks cannot indirectly result in bdrv_drain.  Sounds easier
> >> since there are not many AIO callers anymore - plus it also seems easier
> >> to add debugging code for.
> >>
> >> 3) we provide device callbacks for "am I busy" and invoke them from
> >> bdrv_drain's poll loop.
> >>
> >> Just off the top of my head.  Not sure which is best.
> > 
> > I don't see how 1) and 3) can solve the problem. You still need to
> > declare something busy when someone else drains (so the drain doesn't
> > return too early) and not busy when it calls a nested drain (so that it
> > doesn't deadlock).
> > 
> > 2) would obviously solve the problem, but I'm afraid it's not realistic
> > when you consider that block jobs happen _only_ in callbacks. That much
> > of this is disguised as coroutine code doesn't change the call chains.
> > You also can't use BHs to escape the AIO callback, because then the BH is
> > logically part of the operation that needs to be completed for an
> > external drain to return, so you would still have to call the job busy
> > and would deadlock in a nested drain in that BH. So you're back to
> > square one.
> 
> But then basically the main issue is mirror.c's call to
> bdrv_drained_begin/end.  There are no other calls to
> bdrv_drained_begin/end inside coroutines IIRC.

Coroutine or not doesn't matter. What matters is that you drain inside
some (high-level) operation that already needs to be completed itself
for drain to return. This is the case for any block job completions that
make changes to the graph (which means everything except backup).

> Another long-standing idea is to replace aio_disable/enable_external
> with implementations of the (currently unused) dev_ops callbacks
> .drained_begin and .drained_end.  What if jobs used those callbacks to
> pause themselves(*), and block/mirror.c had a pause point before the
> call to bdrv_drained_begin?
> 
>       (*) of course block/mirror.c would need some smartness to
>           not pause itself when the job itself is asking to drain!

Yes, exactly this required smartness is the kind of problem I've been
talking about here all the time!

Block jobs already implement .drained_begin/poll/end callbacks, they
just attach them directly to the BDS (where they are available as
BdrvChildRole callbacks) instead of a BB. And this is why they will
deadlock if .drained_poll() correctly returns true as long as the job
completion is still pending (or halfway done).

And to get back to the specific patch we're discussing, AIO request
completion in the BlockBackend needs the same kind of smartness so that
it doesn't deadlock, but still makes other callers of drain wait for it.

I still don't know what this smartness should look like
implementation-wise, though.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]