qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 6/7] vhost-user-blk: Add support to reconnect


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [PATCH v7 6/7] vhost-user-blk: Add support to reconnect backend
Date: Thu, 14 Mar 2019 08:11:32 -0400

On Thu, Mar 14, 2019 at 11:43:26AM +0000, Daniel P. Berrangé wrote:
> On Thu, Mar 14, 2019 at 07:34:03AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Mar 14, 2019 at 11:24:22AM +0000, Daniel P. Berrangé wrote:
> > > On Tue, Mar 12, 2019 at 12:49:35PM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Feb 28, 2019 at 04:53:54PM +0800, address@hidden wrote:
> > > > > From: Xie Yongji <address@hidden>
> > > > > 
> > > > > Since we now support the message VHOST_USER_GET_INFLIGHT_FD
> > > > > and VHOST_USER_SET_INFLIGHT_FD. The backend is able to restart
> > > > > safely because it can track inflight I/O in shared memory.
> > > > > This patch allows qemu to reconnect the backend after
> > > > > connection closed.
> > > > > 
> > > > > Signed-off-by: Xie Yongji <address@hidden>
> > > > > Signed-off-by: Ni Xun <address@hidden>
> > > > > Signed-off-by: Zhang Yu <address@hidden>
> > > > > ---
> > > > >  hw/block/vhost-user-blk.c          | 205 
> > > > > +++++++++++++++++++++++------
> > > > >  include/hw/virtio/vhost-user-blk.h |   4 +
> > > > >  2 files changed, 167 insertions(+), 42 deletions(-)
> > > 
> > > 
> > > > >  static void vhost_user_blk_device_realize(DeviceState *dev, Error 
> > > > > **errp)
> > > > >  {
> > > > >      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > > > >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > > > >      VhostUserState *user;
> > > > > -    struct vhost_virtqueue *vqs = NULL;
> > > > >      int i, ret;
> > > > > +    Error *err = NULL;
> > > > >  
> > > > >      if (!s->chardev.chr) {
> > > > >          error_setg(errp, "vhost-user-blk: chardev is mandatory");
> > > > > @@ -312,27 +442,28 @@ static void 
> > > > > vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> > > > >      }
> > > > >  
> > > > >      s->inflight = g_new0(struct vhost_inflight, 1);
> > > > > -
> > > > > -    s->dev.nvqs = s->num_queues;
> > > > > -    s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
> > > > > -    s->dev.vq_index = 0;
> > > > > -    s->dev.backend_features = 0;
> > > > > -    vqs = s->dev.vqs;
> > > > > -
> > > > > -    vhost_dev_set_config_notifier(&s->dev, &blk_ops);
> > > > > -
> > > > > -    ret = vhost_dev_init(&s->dev, s->vhost_user, 
> > > > > VHOST_BACKEND_TYPE_USER, 0);
> > > > > -    if (ret < 0) {
> > > > > -        error_setg(errp, "vhost-user-blk: vhost initialization 
> > > > > failed: %s",
> > > > > -                   strerror(-ret));
> > > > > -        goto virtio_err;
> > > > > -    }
> > > > > +    s->vqs = g_new(struct vhost_virtqueue, s->num_queues);
> > > > > +    s->watch = 0;
> > > > > +    s->should_start = false;
> > > > > +    s->connected = false;
> > > > > +
> > > > > +    qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL, 
> > > > > vhost_user_blk_event,
> > > > > +                             NULL, (void *)dev, NULL, true);
> > > > > +
> > > > > +reconnect:
> > > > > +    do {
> > > > > +        if (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
> > > > > +            error_report_err(err);
> > > > > +            err = NULL;
> > > > > +            sleep(1);
> > > > 
> > > > Seems arbitrary. Is this basically waiting until backend will reconnect?
> > > > Why not block until event on the fd triggers?
> > > > 
> > > > Also, it looks like this will just block forever with no monitor input
> > > > and no way for user to figure out what is going on short of
> > > > crashing QEMU.
> > > 
> > > FWIW, the current vhost-user-net device does exactly the same thing
> > > with calling qemu_chr_fe_wait_connected during its realize() function.
> > 
> > Hmm yes. It doesn't sleep for an arbitrary 1 sec so less of an eyesore :)
> 
> The sleep(1) in this patch simply needs to be removed. I think that
> probably dates from when it was written against the earlier broken
> version of qemu_chr_fe_wait_connected(). That would not correctly
> deal with the "reconnect" flag, and so needing this loop with a sleep
> in it.
> 
> In fact the while loop can be removed as well in this code. It just
> needs to call qemu_chr_fe_wait_connected() once. It is guaranteed
> to have a connected peer once that returns 0.
> 
> qemu_chr_fe_wait_connected() only returns -1 if the operating in
> client mode, and it failed to connect and reconnect is *not*
> requested. In such case the caller should honour the failure and
> quit, not loop to retry.
> 
> 
> The reason vhost-user-net does a loop is because once it has a
> connection it tries todo a protocol handshake, and if that
> handshake fails it closes the chardev and tries to connect
> again. That's not the case in this blk code os the loop is
> not needed.
> 
> > > Can't say I'm thrilled about the existing usage either, but I don't
> > > see a particularly nice alternative if it isn't possible to realize
> > > the device without having a backend available.
> > > 
> > > I don't think this an especially serious problem during cold startup
> > > though since doesn't expect the monitor to become responsive until
> > > the device model is fully realized & all backends connected.
> > > 
> > > The real concern is if you need to hotplug this kind of device
> > > at runtime. In that case blocking the main loop / monitor is a
> > > serious problem that will actively harm both the guest OS and
> > > mgmt applications.
> > > 
> > > The vhost-user-net device will already suffer from that.
> > 
> > Right.
> > 
> > > To solve the problem with hotplug, the mgmt app would need to
> > > plug the chardev backend, then wait for it to become connected,
> > > and only then plug the device frontend. In that case we would not
> > > want this loop. We'd expect it to use the already connected
> > > chardev, and fail the realize() function if the chardev has
> > > lost its connection.
> > 
> > I think as step one we should maybe just busy-wait with no sleep(1).
> > But respecting reconnect would be nice too.
> 
> As above I don't see a need to loop at all.
> 
> > And I think if we can't respect nowait we should
> > detect it and fail, in both net and block ...
> 
> If the chardev is in server mode with "wait" set, or in client mode
> with reconnect not set, then there's actually no need to call
> qemu_chr_fe_wait_connected at all. The qemu_chr_fe_wait_connected is
> only needed if the chardev is a server with "nowait" set or a client
> with "reconnect" set, or if the caller explicitly closes the existing
> connection & wants to establish a new one.
> 
> Regards,
> Daniel

Thanks for the comments Daniel!

> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



reply via email to

[Prev in Thread] Current Thread [Next in Thread]