qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTD


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN
Date: Thu, 2 Apr 2020 09:41:21 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1

02.04.2020 1:38, Eric Blake wrote:
I was trying to test qemu's reconnect-delay parameter by using nbdkit
as a server that I could easily make disappear and resume.  A bit of
experimenting shows that when nbdkit is abruptly killed (SIGKILL),
qemu detects EOF on the socket and manages to reconnect just fine; but
when nbdkit is gracefully killed (SIGTERM), it merely fails all
further guest requests with NBD_ESHUTDOWN until the client disconnects
first, and qemu was blindly failing the I/O request with ESHUTDOWN
from the server instead of attempting to reconnect.

While most NBD server failures are unlikely to change by merely
retrying the same transaction, our decision to not start a retry loop
in the common case is correct.  But NBD_ESHUTDOWN is rare enough, and
really is indicative of a transient situation, that it is worth
special-casing.

Here's the test setup I used: in one terminal, kick off a sequence of
nbdkit commands that has a temporary window where the server is
offline; in another terminal (and within the first 5 seconds) kick off
a qemu-img convert with reconnect enabled.  If the qemu-img process
completes successfully, the reconnect worked.

$ #term1
$ MYSIG=    # or MYSIG='-s KILL'
$ timeout $MYSIG 5s ~/nbdkit/nbdkit -fv --filter=delay --filter=noextents \
   null 200M delay-read=1s; sleep 5; ~/nbdkit/nbdkit -fv --filter=exitlast \
   --filter=delay --filter=noextents null 200M delay-read=1s

$ #term2
$ MYCONN=server.type=inet,server.host=localhost,server.port=10809
$ qemu-img convert -p -O raw --image-opts \
   driver=nbd,$MYCONN,,reconnect-delay=60 out.img

See also: https://bugzilla.redhat.com/show_bug.cgi?id=1819240#c8

Signed-off-by: Eric Blake <address@hidden>
---

This is not a regression, per se, as reconnect-delay has been unchanged
since 4.2; but I'd like to consider this as an interoperability bugfix
worth including in the next rc.

  block/nbd.c | 9 +++++++++
  1 file changed, 9 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 2906484390f9..576b95fb8753 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -863,6 +863,15 @@ static coroutine_fn int nbd_co_receive_one_chunk(
      if (ret < 0) {
          memset(reply, 0, sizeof(*reply));
          nbd_channel_error(s, ret);
+    } else if (s->reconnect_delay && *request_ret == -ESHUTDOWN) {
+        /*
+         * Special case: if we support reconnect and server is warning
+         * us that it wants to shut down, then treat this like an
+         * abrupt connection loss.
+         */
+        memset(reply, 0, sizeof(*reply));
+        *request_ret = 0;
+        nbd_channel_error(s, -EIO);
      } else {
          /* For assert at loop start in nbd_connection_entry */
          *reply = s->reply;


Interesting. I see, that prior to this patch we don't handle ESHUTDOWN at all 
in nbd client..

What does spec say?

> On a server shutdown, the server SHOULD wait for inflight requests to be 
serviced prior to initiating a hard disconnect. A server MAY speed this process up 
by issuing error replies. The error value issued in respect of these requests and 
any subsequently received requests SHOULD be NBD_ESHUTDOWN.
> If the client receives an NBD_ESHUTDOWN error it MUST initiate a soft 
disconnect.
> The client MAY issue a soft disconnect at any time, but SHOULD wait until 
there are no inflight requests first.
> The client and the server MUST NOT initiate any form of disconnect other than 
in one of the above circumstances.

Hmm. So, actually we MUST initiate a soft disconnect, which means that we must 
send NBD_CMD_DISC..

Then, what about "SHOULD wait until no inflight requests"? We don't do it 
either.. Should we?

--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]