[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2] nbd/server: Allow MULTI_CONN for shared writable exports
From: |
Eric Blake |
Subject: |
Re: [PATCH v2] nbd/server: Allow MULTI_CONN for shared writable exports |
Date: |
Tue, 15 Feb 2022 17:24:14 -0600 |
User-agent: |
NeoMutt/20211029-322-5436a9 |
On Tue, Feb 15, 2022 at 09:23:36PM +0200, Nir Soffer wrote:
> On Tue, Feb 15, 2022 at 7:22 PM Eric Blake <eblake@redhat.com> wrote:
>
> > According to the NBD spec, a server advertising
> > NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
> > not see any cache inconsistencies: when properly separated by a single
> > flush, actions performed by one client will be visible to another
> > client, regardless of which client did the flush. We satisfy these
> > conditions in qemu when our block layer is backed by the local
> > filesystem (by virtue of the semantics of fdatasync(), and the fact
> > that qemu itself is not buffering writes beyond flushes). It is
> > harder to state whether we satisfy these conditions for network-based
> > protocols, so the safest course of action is to allow users to opt-in
> > to advertising multi-conn. We may later tweak defaults to advertise
> > by default when the block layer can confirm that the underlying
> > protocol driver is cache consistent between multiple writers, but for
> > now, this at least allows savvy users (such as virt-v2v or nbdcopy) to
> > explicitly start qemu-nbd or qemu-storage-daemon with multi-conn
> > advertisement in a known-safe setup where the client end can then
> > benefit from parallel clients.
> >
>
> It makes sense, and will be used by oVirt. Actually we are already using
> multiple connections for writing about 2 years, based on your promise
> that if every client writes to district areas this is safe.
I presume s/district/distinct/, but yes, I'm glad we're finally trying
to make the code match existing practice ;)
> > +++ b/docs/tools/qemu-nbd.rst
> > @@ -139,8 +139,7 @@ driver options if ``--image-opts`` is specified.
> > .. option:: -e, --shared=NUM
> >
> > Allow up to *NUM* clients to share the device (default
> > - ``1``), 0 for unlimited. Safe for readers, but for now,
> > - consistency is not guaranteed between multiple writers.
> > + ``1``), 0 for unlimited.
> >
>
> Removing the note means that now consistency is guaranteed between
> multiple writers, no?
>
> Or maybe we want to mention here that consistency depends on the protocol
> and users can opt in, or refer to the section where this is discussed?
Yeah, a link to the QAPI docs where multi-conn is documented might be
nice, except I'm not sure the best way to do that in our sphinx
documentation setup.
> > +##
> > +# @NbdExportMultiConn:
> > +#
> > +# Possible settings for advertising NBD multiple client support.
> > +#
> > +# @off: Do not advertise multiple clients.
> > +#
> > +# @on: Allow multiple clients (for writable clients, this is only safe
> > +# if the underlying BDS is cache-consistent, such as when backed
> > +# by the raw file driver); ignored if the NBD server was set up
> > +# with max-connections of 1.
> > +#
> > +# @auto: Behaves like @off if the export is writable, and @on if the
> > +# export is read-only.
> > +#
> > +# Since: 7.0
> > +##
> > +{ 'enum': 'NbdExportMultiConn',
> > + 'data': ['off', 'on', 'auto'] }
> >
>
> Are we going to have --multi-con=(on|off|auto)?
Oh. The QMP command (which is immediately visible through
nbd-server-add/block-storage-add to qemu and qemu-storage-daemon)
gains "multi-conn":"on", but you may be right that qemu-nbd would want
a command line option (either that, or we accellerate our plans that
qsd should replace qemu-nbd).
> > +++ b/blockdev-nbd.c
> > @@ -44,6 +44,11 @@ bool nbd_server_is_running(void)
> > return nbd_server || is_qemu_nbd;
> > }
> >
> > +int nbd_server_max_connections(void)
> > +{
> > + return nbd_server ? nbd_server->max_connections : -1;
> > +}
> >
>
> -1 is a little bit strange for a limit, maybe 1 is a better default when
> we nbd_server == NULL? When can this happen?
In qemu, if you haven't used the QMP command 'nbd-server-start' yet.
In qemu-nbd, always (per the nbd_server_is_running function just
above). My iotest only covered the qemu/qsd side, not the qemu-nbd
side, so it looks like I need a v3...
> > +++ b/nbd/server.c
> > + /*
> > + * Determine whether to advertise multi-conn. Default is auto,
> > + * which resolves to on for read-only and off for writable. But
> > + * if the server has max-connections 1, that forces the flag off.
> >
>
> Looks good, this can be enabled automatically based on the driver
> if we want, so users using auto will can upgrade to multi-con automatically.
Yes, that's part of why I made it a tri-state with a default of 'auto'.
>
>
> > + */
> > + if (!arg->has_multi_conn) {
> > + arg->multi_conn = NBD_EXPORT_MULTI_CONN_AUTO;
> > + }
> > + if (nbd_server_max_connections() == 1) {
>
> + arg->multi_conn = NBD_EXPORT_MULTI_CONN_OFF;
> > + }
>
> + if (arg->multi_conn == NBD_EXPORT_MULTI_CONN_AUTO) {
> > + multi_conn = readonly;
> > + } else {
> > + multi_conn = arg->multi_conn == NBD_EXPORT_MULTI_CONN_ON;
> > + }
> >
>
> This part is a little bit confusing - we do:
> - initialize args->multi_con if it has not value
> - set the temporary multi_con based now initialized args->multi_con
>
> I think it will be nicer to separate arguments parsing, so there is no need
> to initialize it here or have arg->has_multi_conn, but I did not check how
> other arguments are handled.
arg->has_multi_conn is fallout from the fact that our QAPI must remain
back-compat. If it is false, the user did not pass "multi-conn":...,
so we want the default value of "auto". Once we've established the
default, then we force multi-conn off if we know we are limited to one
client (although as you pointed out, nbd_server_max_connections()
needs to be tested with qemu-nbd). Then, it's easier to resolve the
tri-state enum variable into the bool of what we actually advertise to
the NBD client.
> > +++ b/tests/qemu-iotests/tests/nbd-multiconn
> > @@ -0,0 +1,188 @@
> > +#!/usr/bin/env bash
> > +# group: rw auto quick
> > +#
> > +# Test that qemu-nbd MULTI_CONN works
> > +#
> > +echo
> > +echo "=== Initial image setup ==="
> > +echo
> > +
> > +_make_test_img 4M
> > +$QEMU_IO -c 'w -P 1 0 2M' -c 'w -P 2 2M 2M' "$TEST_IMG" | _filter_qemu_io
> > +_launch_qemu 2> >(_filter_nbd)
> > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"qmp_capabilities"}' "return"
> > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"blockdev-add",
> > + "arguments":{"driver":"qcow2", "node-name":"n",
> > + "file":{"driver":"file", "filename":"'"$TEST_IMG"'"}}}' "return"
I'm not the best at writing python iotests; I welcome a language
translation of this aspect. But the shell code for
_launch_qemu/_send_qemu_cmd was already pretty nice for setting up a
long-running background qemu process where I can pass in QMP commands
at will, then interact from other processes.
> > +export nbd_unix_socket
> > +
> > +echo
> > +echo "=== Default nbd server settings ==="
> > +echo
> > +# Default allows for unlimited connections, readonly images advertise
> > +# multi-conn, and writable images do not
> > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start",
> > + "arguments":{"addr":{"type":"unix",
> > + "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return"
> > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add",
> > + "arguments":{"type":"nbd", "id":"r", "node-name":"n",
> > + "name":"r"}}' "return"
> > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add",
> > + "arguments":{"type":"nbd", "id":"w", "node-name":"n",
> > + "name":"w", "writable":true}}' "return"
> > +nbdsh -u "nbd+unix:///r?socket=$nbd_unix_socket" -c '
> > +assert h.can_multi_conn()
> > +h.shutdown()
> > +print("nbdsh passed")'
> > +nbdsh -u "nbd+unix:///w?socket=$nbd_unix_socket" -c '
> > +assert not h.can_multi_conn()
> > +h.shutdown()
> > +print("nbdsh passed")'
> >
>
> Mixing of shell and python is very confusing. Wouldn't it be much cleaner
> to write the test in python?
Here, nbdsh -c 'python snippet' is used as a shell command line
parameter. Writing python code to call out to a system() command
where one of the arguments to that command is a python script snippet
is going to be just as annoying as writing it in bash.
> > +echo
> > +echo "=== Demonstrate parallel writers ==="
> > +echo
> > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"nbd-server-start",
> > + "arguments":{"addr":{"type":"unix",
> > + "data":{"path":"'"$nbd_unix_socket"'"}}}}' "return"
> > +_send_qemu_cmd $QEMU_HANDLE '{"execute":"block-export-add",
> > + "arguments":{"type":"nbd", "id":"w", "node-name":"n",
> > + "name":"", "multi-conn":"on", "writable":true}}' "return"
> > +
> > +nbdsh -c '
> > +import os
> > +sock = os.getenv("nbd_unix_socket")
> > +h = []
> > +
> > +for i in range(3):
> > + h.append(nbd.NBD())
> > + h[i].connect_unix(sock)
> > + assert h[i].can_multi_conn()
> >
>
> This is somewhat C in python, maybe:
>
> handles = [nbd.NBD() for _ in range(3)]
>
> for h in handles:
> h.connect_unix(sock)
> assert h.can_multi_con()
>
> Since assert will abort the test, and we don't handle
> exceptions, failure will not shutdown the connections
> but it should not matter for the purpose of a test.
As I said, I'm open to python suggestions :) I like your approach.
>
>
> > +
> > +buf1 = h[0].pread(1024 * 1024, 0)
> > +if buf1 != b"\x01" * 1024 * 1024:
> > + print("Unexpected initial read")
> >
>
> Not clear when we initialize the buffer to \x01 - is this the qemu-io above?
Yes, when the qcow2 file was initially created.
>
>
> > +buf2 = b"\x03" * 1024 * 1024
> > +h[1].pwrite(buf2, 0)
> > +h[2].flush()
> > +buf1 = h[0].pread(1024 * 1024, 0)
> > +if buf1 == buf2:
> > + print("Flush appears to be consistent across connections")
> >
>
> buf1 was the initial content, buf2 is the new content, but now we override
> but1 to check that the right was flushed. It will be be better to use
> different
> names like inittial_data, updated_data, current_data.
Can do.
>
> This block is the most important part of the test, showing that we can write
> in one connection, flush in the second, and read the written block in the
> third.
> Maybe add a comment about this? I think it will help someone else trying
> to understand what this part is trying to test.
Can do.
> > +{"execute":"block-export-add",
> > + "arguments":{"type":"nbd", "id":"w", "node-name":"n",
> > + "name":"", "multi-conn":"on", "writable":true}}
> > +{"return": {}}
> > +Flush appears to be consistent across connections
> > +{"execute":"block-export-del",
> > + "arguments":{"id":"w"}}
> > +{"return": {}}
> > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > "event": "BLOCK_EXPORT_DELETED", "data": {"id": "w"}}
> > +{"execute":"nbd-server-stop"}
> > +{"return": {}}
> > +{"execute":"quit"}
> > +{"return": {}}
> > +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP},
> > "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
> >
>
> Nothing about the contents here says anything about the actual test
> except the "Flush appears..." line.
Yeah, it's a lot of QMP debugging (to show what commands were run in
setting up the server), and less verbose in the nbdsh side. Do I need
to add more output during the nbdsh that uses multiple connections?
>
>
> > +*** done
> > --
> > 2.35.1
> >
>
> Looks good, feel free to ignore the style comments and suggestion to rewrite
> the test in python.
The style comments are nice, the rewriting is going to be harder for
me (but I'll accept help). At any rate, getting qemu-nbd to be
feature-compatible may require a v3 anyway.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
Re: [PATCH v2] nbd/server: Allow MULTI_CONN for shared writable exports, Vladimir Sementsov-Ogievskiy, 2022/02/15
Re: [PATCH v2] nbd/server: Allow MULTI_CONN for shared writable exports, Markus Armbruster, 2022/02/16