qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 3/3] nbd/server: Allow MULTI_CONN for shared writable expo


From: Kevin Wolf
Subject: Re: [PATCH v3 3/3] nbd/server: Allow MULTI_CONN for shared writable exports
Date: Fri, 29 Apr 2022 14:49:35 +0200

Am 27.04.2022 um 23:39 hat Eric Blake geschrieben:
> On Wed, Apr 27, 2022 at 05:52:09PM +0200, Kevin Wolf wrote:
> > Am 14.03.2022 um 21:38 hat Eric Blake geschrieben:
> > > According to the NBD spec, a server that advertises
> > > NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
> > > not see any cache inconsistencies: when properly separated by a single
> > > flush, actions performed by one client will be visible to another
> > > client, regardless of which client did the flush.  We satisfy these
> > > conditions in qemu when our block layer is backed by the local
> > > filesystem (by virtue of the semantics of fdatasync(), and the fact
> > > that qemu itself is not buffering writes beyond flushes).  It is
> > > harder to state whether we satisfy these conditions for network-based
> > > protocols, so the safest course of action is to allow users to opt-in
> > > to advertising multi-conn.
> > 
> > Do you have an example of how this could be unsafe?
> 
> Nothing direct.  I tried to turn this on unconditionally in an earlier
> version, and we waffled about whether we could prove that network
> block backends (such as gluster) provide us the safety that the NBD
> spec demands:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg00038.html
> https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg06744.html
> 
> > 
> > As I understand it, the NBD server has a single BlockBackend and
> > therefore is a single client for the backend, be it file-posix or any
> > network-based protocol. It doesn't really make a difference for the
> > storage from how many different NBD clients the requests are coming.
> > 
> > I would have expected that cache coherency of the protocol level driver
> > would only matter if you had two QEMU processes accessing the same file
> > concurrently.
> 
> Or a multi-pathed connection to network storage, where one QEMU
> process accesses the network device, but those accesses may
> round-robin which server they reach, and where any caching at an
> individual server may be inconsistent with what is seen on another
> server unless flushing is used to force the round-robin access to
> synchronize between the multi-path views.

I don't think this is a realistic scenario. It would mean that you
successfully write data to the storage, and when you then read the same
location, you get different data back. This would be inconsistent even
with a single client. So I'd call this broken storage that should be
replaced as soon as possible.

I could imagine problems of this kind with two separate connections to
the network storage, but here all the NBD clients share a single
BlockBackend, so for the storage they are a single connection.

> > In fact, I don't think we even need the flush restriction from the NBD
> > spec. All clients see the same state (that of the NBD server
> > BlockBackend) even without anyone issuing any flush. The flush is only
> > needed to make sure that cached data is written to the backing storage
> > when writeback caches are involved.
> > 
> > Please correct me if I'm misunderstanding something here.
> 
> Likewise me, if I'm being overly cautious.
> 
> I can certainly write a simpler v4 that just always advertises
> MULTI_CONN if we allow more than one client, without any knob to
> override it; it's just that it is harder to write a commit message
> justifying why I think it is safe to do so.

Having an explicit option doesn't hurt, but it's the reasoning in the
commit message that feels wrong to me.

We could consider changing "auto" to advertise MULTI_CONN even for
writable exports. There might still be a good reason not to do this by
default, though, because of the NBD clients. I'm quite sure that the
backend won't make any trouble, but client might if someone else is
writing to the same image (this is why we require an explicit
share-rw=on for guest devices in the same case).

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]