[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/7] block/nbd: decouple reconnect from drain
From: |
Roman Kagan |
Subject: |
Re: [PATCH 0/7] block/nbd: decouple reconnect from drain |
Date: |
Fri, 26 Mar 2021 11:07:10 +0300 |
On Wed, Mar 17, 2021 at 11:35:31AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 15.03.2021 09:06, Roman Kagan wrote:
> > The reconnection logic doesn't need to stop while in a drained section.
> > Moreover it has to be active during the drained section, as the requests
> > that were caught in-flight with the connection to the server broken can
> > only usefully get drained if the connection is restored. Otherwise such
> > requests can only either stall resulting in a deadlock (before
> > 8c517de24a), or be aborted defeating the purpose of the reconnection
> > machinery (after 8c517de24a).
> >
> > This series aims to just stop messing with the drained section in the
> > reconnection code.
> >
> > While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
> > connection_co reentrance"); as I've missed the point of that commit I'd
> > appreciate more scrutiny in this area.
> >
> > Roman Kagan (7):
> > block/nbd: avoid touching freed connect_thread
> > block/nbd: use uniformly nbd_client_connecting_wait
> > block/nbd: assert attach/detach runs in the proper context
> > block/nbd: transfer reconnection stuff across aio_context switch
> > block/nbd: better document a case in nbd_co_establish_connection
> > block/nbd: decouple reconnect from drain
> > block/nbd: stop manipulating in_flight counter
> >
> > block/nbd.c | 191 +++++++++++++++++++++++----------------------------
> > nbd/client.c | 2 -
> > 2 files changed, 86 insertions(+), 107 deletions(-)
> >
>
>
> Hmm. The huge source of problems for this series is weird logic around
> drain and aio context switch in NBD driver.
>
> Why do we have all these too complicated logic with abuse of in_flight
> counter in NBD? The answer is connection_co. NBD differs from other
> drivers, it has a coroutine independent of request coroutines. And we
> have to move this coroutine carefully to new aio context. We can't
> just enter it from the new context, we want to be sure that
> connection_co is in one of yield points that supports reentering.
>
> I have an idea of how to avoid this thing: drop connection_co at all.
>
> 1. nbd negotiation goes to connection thread and becomes independent
> of any aio context.
>
> 2. waiting for server reply goes to request code. So, instead of
> reading the replay from socket always in connection_co, we read in the
> request coroutine, after sending the request. We'll need a CoMutex for
> it (as only one request coroutine should read from socket), and be
> prepared to coming reply is not for _this_ request (in this case we
> should wake another request and continue read from socket).
>
> but this may be too much for soft freeze.
This approach does look appealing to me, and I gave it a quick shot but
the amount of changes this involves exceeds the rc tolerance indeed.
> Another idea:
>
> You want all the requests be completed on drain_begin(), not
> cancelled. Actually, you don't need reconnect runnning during drained
> section for it. It should be enough just wait for all currenct
> requests before disabling the reconnect in drain_begin handler.
So effectively you suggest doing nbd's own drain within
bdrv_co_drain_begin callback. I'm not totally sure there are no
assumptions this may break, but I'll try to look into this possibility.
Thanks,
Roman.
- [PATCH 7/7] block/nbd: stop manipulating in_flight counter, (continued)
Re: [PATCH 0/7] block/nbd: decouple reconnect from drain, Vladimir Sementsov-Ogievskiy, 2021/03/15
Re: [PATCH 0/7] block/nbd: decouple reconnect from drain, Eric Blake, 2021/03/16
Re: [PATCH 0/7] block/nbd: decouple reconnect from drain, Vladimir Sementsov-Ogievskiy, 2021/03/17
- Re: [PATCH 0/7] block/nbd: decouple reconnect from drain,
Roman Kagan <=