qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTD


From: Richard W.M. Jones
Subject: Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN
Date: Thu, 2 Apr 2020 15:04:26 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Thu, Apr 02, 2020 at 08:41:31AM -0500, Eric Blake wrote:
> On 4/2/20 3:38 AM, Richard W.M. Jones wrote:
> >For the case I care about (long running virt-v2v conversions with an
> >intermittent network) we don't expect that nbdkit will be killed nor
> >gracefully shut down.  Instead what we expect is that nbdkit returns
> >an error such as NBD_EIO and keeps running.
> >
> >Indeed if nbdkit actually dies then reconnecting will not help since
> >there will be no server to reconnect to.

To put this in context for other people reading, virt-v2v uses this
sort of situation:

<pre>
                          +---------- same machine ----------+
                          |                                  |
  +------------+            +----------+        +----------+
  | remote     |            | nbdkit   |        | qemu-img |
  | VMware     |----------->| + VDDK   |------->| convert  |--> output
  | server     |            |          |        |          |
  +------------+            +----------+        +----------+
             VMware proprietary      NBD over Unix skt
             protocol over TCP
</pre>

The problem being addressed is that the whole task can run for many
hours, and a single interruption in the network between virt-v2v and
the remote VMware server can cause the entire process to fail.
nbdkit-retry-filter[0] attempts to address the problem by allowing the
VMware side of the protocol to be restarted without qemu-img seeing
any interruption (nor any error) on the NBD connection.

[0] http://libguestfs.org/nbdkit-retry-filter.1.html

> Hmm.  The idea of reconnect-delay in qemu is that if the connection
> to the server is dropped, we try to reconnect and then retry the I/O
> operation.  Maybe what we want is an nbdkit filter which turns EIO
> errors from the v2v plugin into forced server connection drops, but
> leave nbdkit up and running to allow the next client to connect.

Note that of the three nbdkit plugins we currently use (vddk[1], curl
and ssh) at least two of them have the property that closing and
reopening the plugin handle (which is what nbdkit-retry-filter does)
reconnects to the remote server.  To take nbdkit-ssh-plugin as a
specific example[2], the .open callback calls ssh_connect() and the
.close callback calls ssh_disconnect().  VDDK works the same way.  I'm
a bit unclear on nbdkit-curl-plugin because IIRC underlying HTTPS
connections may be managed in a pool inside Curl.

[1] All in this file, starting here:
https://github.com/libguestfs/virt-v2v/blob/8cf4488d6bcde8dd0b84c199c96ff5763e6a08fa/v2v/nbdkit_sources.ml#L142

[2] 
https://github.com/libguestfs/nbdkit/blob/d085b87dcbe05c9c2d0049f0fc613455490c1032/plugins/ssh/ssh.c#L468

> That's different from the existing --filter=retry behavior (where we
> try to keep the client connection alive and reopen the plugin), but
> has a similar effect (because we force the connection to the client
> to drop, the client would have to reconnect to get more data, and
> reconnecting triggers a retry on connecting to the plugin).

I get that this is different from the retry filter, but isn't this
just working around behaviour in qemu's NBD client?  Couldn't qemu's
NBD client be changed to reconnect on EIO?  Or retry?  (Optionally of
course, and this would be orthogonal the current patch.)

> And it's different from --filter=exitlast (that says to quit nbdkit
> altogether, rather than just the current connection with a client).

We'd certainly need a new nbdkit_* API, rather like the way we added
nbdkit_shutdown to make nbdkit-exitlast-filter possible.  However I'm
still unclear if the new filter's behaviour would be necessary.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]