[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 7/7] block/nbd: stop manipulating in_flight counter
From: |
Roman Kagan |
Subject: |
Re: [PATCH 7/7] block/nbd: stop manipulating in_flight counter |
Date: |
Fri, 26 Mar 2021 10:41:53 +0300 |
On Tue, Mar 16, 2021 at 09:37:13PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 16.03.2021 19:08, Roman Kagan wrote:
> > On Mon, Mar 15, 2021 at 11:15:44PM +0300, Vladimir Sementsov-Ogievskiy
> > wrote:
> > > 15.03.2021 09:06, Roman Kagan wrote:
> > > > As the reconnect logic no longer interferes with drained sections, it
> > > > appears unnecessary to explicitly manipulate the in_flight counter.
> > > >
> > > > Fixes: 5ad81b4946 ("nbd: Restrict connection_co reentrance")
> > >
> > > And here you actually allow qemu_aio_coroutine_enter() call in
> > > nbd_client_attach_aio_context_bh() to enter connection_co in any yield
> > > point which is possible during drained section. The analysis should be
> > > done to be sure that all these yield points are safe for reentering by
> > > external qemu_aio_coroutine_enter(). (By external I mean not by the
> > > actual enter() we are waiting for at the yield() point. For example
> > > qemu_channel_yield() supports reentering.. And therefore (as I
> > > understand after fast looking through) nbd_read() should support
> > > reentering too..
> >
> > I'll do a more thorough analysis of how safe it is.
> >
> > FWIW this hasn't triggered any test failures yet, but that assert in
> > patch 3 didn't ever go off either so I'm not sure I can trust the tests
> > on this.
> >
>
> Hmm. First, we should consider qemu_coroutine_yield() in
> nbd_co_establish_connection().
>
> Most of nbd_co_establish_connection_cancel() purpose is to avoid
> reentering this yield()..
Unless I'm overlooking something, nbd_co_establish_connection() is fine
with spurious entering at this yield point. What does look problematic,
though, is your next point:
> And I don't know, how to make it safely reenterable: keep in mind bh
> that may be already scheduled by connect_thread_func(). And if bh is
> already scheduled, can we cancel it? I'm not sure.
>
> We have qemu_bh_delete(). But is it possible, that BH is near to be
> executed and already cannot be removed by qemu_bh_delete()? I don't
> know.
>
> And if we can't safely drop the bh at any moment, we should wait in
> nbd_client_detach_aio_context until the scheduled bh enters the
> connection_co.. Or something like this
So I think it's not the reentry at this yield point per se which is
problematic, it's that that bh may have been scheduled before the
aio_context switch so once it runs it would wake up connection_co on the
old aio_context. I think it may be possible to address by adding a
check into connect_bh().
Thanks,
Roman.
- Re: [PATCH 6/7] block/nbd: decouple reconnect from drain, (continued)
[PATCH 3/7] block/nbd: assert attach/detach runs in the proper context, Roman Kagan, 2021/03/15
[PATCH 7/7] block/nbd: stop manipulating in_flight counter, Roman Kagan, 2021/03/15
Re: [PATCH 0/7] block/nbd: decouple reconnect from drain, Vladimir Sementsov-Ogievskiy, 2021/03/15
Re: [PATCH 0/7] block/nbd: decouple reconnect from drain, Eric Blake, 2021/03/16
Re: [PATCH 0/7] block/nbd: decouple reconnect from drain, Vladimir Sementsov-Ogievskiy, 2021/03/17