qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTD


From: Eric Blake
Subject: Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN
Date: Thu, 2 Apr 2020 08:33:20 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0

On 4/2/20 1:41 AM, Vladimir Sementsov-Ogievskiy wrote:
02.04.2020 1:38, Eric Blake wrote:
I was trying to test qemu's reconnect-delay parameter by using nbdkit
as a server that I could easily make disappear and resume.  A bit of
experimenting shows that when nbdkit is abruptly killed (SIGKILL),
qemu detects EOF on the socket and manages to reconnect just fine; but
when nbdkit is gracefully killed (SIGTERM), it merely fails all
further guest requests with NBD_ESHUTDOWN until the client disconnects
first, and qemu was blindly failing the I/O request with ESHUTDOWN
from the server instead of attempting to reconnect.

While most NBD server failures are unlikely to change by merely
retrying the same transaction, our decision to not start a retry loop
in the common case is correct.  But NBD_ESHUTDOWN is rare enough, and
really is indicative of a transient situation, that it is worth
special-casing.


Interesting. I see, that prior to this patch we don't handle ESHUTDOWN at all in nbd client..

What does spec say?

> On a server shutdown, the server SHOULD wait for inflight requests to be serviced prior to initiating a hard disconnect. A server MAY speed this process up by issuing error replies. The error value issued in respect of these requests and any subsequently received requests SHOULD be NBD_ESHUTDOWN. > If the client receives an NBD_ESHUTDOWN error it MUST initiate a soft disconnect.

Perhaps the spec should be relaxed to state that a client SHOULD initiate soft disconnect (as there are existing clients that do not). If a server knows it wants to initiate hard disconnect soon, it shouldn't be forced to wait for a client to respond to NBD_ESHUTDOWN, since not all clients do. Then again, it is indeed nicer if the client does initiate soft disconnect (as soft is always cleaner than hard).

> The client MAY issue a soft disconnect at any time, but SHOULD wait until there are no inflight requests first. > The client and the server MUST NOT initiate any form of disconnect other than in one of the above circumstances.

Hmm. So, actually we MUST initiate a soft disconnect, which means that we must send NBD_CMD_DISC..

With this patch as-is, qemu as client initiates hard disconnect in response to NBD_ESHUTDOWN (but only if it plans on trying to reconnect).


Then, what about "SHOULD wait until no inflight requests"? We don't do it either.. Should we?

qemu as server doesn't send NBD_ESHUTDOWN. It probably should (the way nbdkit does), but that's orthogonal to qemu as client responding to NBD_ESHUTDOWN.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]