qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 6/7] block/nbd: decouple reconnect from drain


From: Roman Kagan
Subject: Re: [PATCH 6/7] block/nbd: decouple reconnect from drain
Date: Tue, 16 Mar 2021 19:03:04 +0300

On Mon, Mar 15, 2021 at 11:10:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 15.03.2021 09:06, Roman Kagan wrote:
> > The reconnection logic doesn't need to stop while in a drained section.
> > Moreover it has to be active during the drained section, as the requests
> > that were caught in-flight with the connection to the server broken can
> > only usefully get drained if the connection is restored.  Otherwise such
> > requests can only either stall resulting in a deadlock (before
> > 8c517de24a), or be aborted defeating the purpose of the reconnection
> > machinery (after 8c517de24a).
> > 
> > Since the pieces of the reconnection logic are now properly migrated
> > from one aio_context to another, it appears safe to just stop messing
> > with the drained section in the reconnection code.
> > 
> > Fixes: 5ad81b4946 ("nbd: Restrict connection_co reentrance")
> 
> I'd not think that it "fixes" it. Behavior changes.. But 5ad81b4946
> didn't introduce any bugs.

I guess you're right.

In fact I did reproduce the situation when the drain made no progress
exactly becase the only in-flight reference was taken by the
connection_co, but it may be due to some intermediate stage of the patch
development so I need to do a more thorough analysis to tell if it was
triggerable with the original code.

> > Fixes: 8c517de24a ("block/nbd: fix drain dead-lock because of nbd 
> > reconnect-delay")
> 
> And here..
> 
> 1. There is an existing problem (unrelated to nbd) in Qemu that long
> io request which we have to wait for at drained_begin may trigger a
> dead lock
> (https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg01339.html)
> 
> 2. So, when we have nbd reconnect (and therefore long io requests) we
> simply trigger this deadlock.. That's why I decided to cancel the
> requests (assuming they will most probably fail anyway).
> 
> I agree that nbd driver is wrong place for fixing the problem
> described in
> (https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg01339.html),
> but if you just revert 8c517de24a, you'll see the deadlock again..

I may have misunderstood that thread, but isn't that deadlock exactly
due to the requests being unable to ever drain because the
reconnection process is suspended while the drain is in progress?

Thanks,
Roman.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]