Re: [Qemu-devel] [PATCH v10 00/18] Vhost and vhost-net support for users

From: Nikolay Nikolaev
Subject: Re: [Qemu-devel] [PATCH v10 00/18] Vhost and vhost-net support for userspace based backends
Date: Wed, 28 May 2014 12:05:27 +0300


On Wed, May 28, 2014 at 11:43 AM, Anshul Makkar <address@hidden> wrote:

We are also trying to develop a solution where we can implement the switch in the user mode (thinking of using VDE) and then rdma packet directly to other end without involving the kernel layers. Is the above solution/patch series implements that using Snabbswitch ethernet switch .
This solution targets sharing the virtio virtqueues with a user-space process. Snabbswitch is a user of vhost-user, but the protocol should be adoptable by other projects too. Patch 16 adds a protocol description document that can be used as a reference.

Confused here, please can you share your thoughts.
All the the code and discussions are publicly available. 

Anshul Makkar

On Tue, May 27, 2014 at 2:03 PM, Nikolay Nikolaev <address@hidden> wrote:
In this patch series we would like to introduce our approach for putting a
virtio-net backend in an external userspace process. Our eventual target is to
run the network backend in the Snabbswitch ethernet switch, while receiving
traffic from a guest inside QEMU/KVM which runs an unmodified virtio-net

For this, we are working into extending vhost to allow equivalent functionality
for userspace. Vhost already passes control of the data plane of virtio-net to
the host kernel; we want to realize a similar model, but for userspace.

In this patch series the concept of a vhost-backend is introduced.

We define two vhost backend types - vhost-kernel and vhost-user. The former is
the interface to the current kernel module implementation. Its control plane is
ioctl based. The data plane is realized by the kernel directly accessing the
QEMU allocated, guest memory.

In the new vhost-user backend, the control plane is based on communication
between QEMU and another userspace process using a unix domain socket. This
allows to implement a virtio backend for a guest running in QEMU, inside the
other userspace process. For this communication we use a chardev with a Unix
domain socket backend. Vhost-user is client/server agnostic regarding the
chardev, however it does not support the 'nowait' and 'telnet' options.

We rely on the memdev with a memory-file backend. The backend's share=on option
should be used. HugeTLBFS is required for this option to work.

The data path is realized by directly accessing the vrings and the buffer data
off the guest's memory.

The current user of vhost-user is only vhost-net. We add a new netdev backend
that is intended to initialize vhost-net with vhost-user backend.

Example usage:

qemu -m 512 \
     -object memory-file,id=mem,size=512M,mem-path=/hugetlbfs,share=on \
     -numa node,memdev=mem \
     -chardev socket,id=chr0,path=/path/to/socket \
     -netdev type=vhost-user,id=net0,chardev=chr0 \
     -device virtio-net-pci,netdev=net0

On non-MSIX guests the vhost feature can be forced using a special option:

     -netdev type=vhost-user,id=net0,chardev=chr0,vhostforce

In order to use ioeventfds, kvm should be enabled.

The work is made on top of the NUMA patch series v3.2

This code can be pulled from address@hidden:virtualopensystems/qemu.git vhost-user-v10
A simple functional test is available in tests/vhost-user-test.c

A reference vhost-user slave for testing is also available from address@hidden:virtualopensystems/vapp.git

Changes from v9:
 - Rebased on the NUMA memdev patchseries and reworked to use memdev
 - Removed -mem-path refactoring
 - Removed all reconnection code
 - Fixed 100% CPU usage in the G_IO_HUP handler after disconnect
 - Reworked vhost feature bits handling so vhost-user has better control in the negotiation

Changes from v8:
 - Removed prealloc property from the -mem-path refactoring
 - Added and use new function - kvm_eventfds_enabled
 - Add virtio_queue_get_avail_idx used in vhost_virtqueue_stop to
   get a sane value in case of VHOST_GET_VRING_BASE failure
 - vhost user uses kvm_eventfds_enabled to check whether the ioeventfd
   capability of KVM is available
 - Added flag VHOST_USER_VRING_NOFD_MASK to be set when KICK, CALL or ERR file
   descriptor is invalid or ioeventfd is not available

Changes from v7:
 - Slave reconnection when using chardev in server mode
 - qtest vhost-user-test added
 - New qemu_chr_fe_get_msgfds for reading multiple fds from the chardev
 - Mandatory features in vhost_dev, used on reconnect to verify for conflicts
 - Add vhostforce parameter to -netdev vhost-user (for non-MSIX guests)
 - Extend libqemustub.a to support qemu-char.c

Changes from v6:
 - Remove the 'unlink' property of '-mem-path'
 - Extend qemu-char: blocking read, send fds, monitor for connection close
 - Vhost-user uses chardev as a backend
 - Poll and reconnect removed (no VHOST_USER_ECHO).
 - Disconnect is deteced by the chardev (G_IO_HUP event)
 - vhost-backend.c split to vhost-user.c

Changes from v5:
 - Split -mem-path unlink option to a separate patch
 - Fds are passed only in the ancillary data
 - Stricter message size checks on receive/send
 - Netdev vhost-user now includes path and poll_time options
 - The connection probing interval is configurable

Changes from v4:
 - Use error_report for errors
 - VhostUserMsg has new field `size` indicating the following payload length.
   Field `flags` now has version and reply bits. The structure is packed.
 - Send data is of variable length (`size` field in message)
 - Receive in 2 steps, header and payload
 - Add new message type VHOST_USER_ECHO, to check connection status

Changes from v3:
 - Convert -mem-path to QemuOpts with prealloc, share and unlink properties
 - Set 1 sec timeout when read/write to the unix domain socket
 - Fix file descriptor leak

Changes from v2:
 - Reconnect when the backend disappears

Changes from v1:
 - Implementation of vhost-user netdev backend
 - Code improvements


Nikolay Nikolaev (18):
      Add kvm_eventfds_enabled function
      Add chardev API qemu_chr_fe_read_all
      Add chardev API qemu_chr_fe_set_msgfds
      Add chardev API qemu_chr_fe_get_msgfds
      Add G_IO_HUP handler for socket chardev
      vhost: add vhost_get_features and vhost_ack_features
      vhost_net should call the poll callback only when it is set
      Refactor virtio-net to use generic get_vhost_net
      vhost_net_init will use VhostNetOptions to get all its arguments
      Add vhost_ops to vhost_dev struct and replace all relevant ioctls
      Add vhost-backend and VhostBackendType
      Add vhost-user as a vhost backend.
      vhost-net: vhost-user feature bits support
      Add new vhost-user netdev backend
      Add the vhost-user netdev backend to the command line
      Add vhost-user protocol documentation
      libqemustub: add stubs to be able to use qemu-char.c
      Add qtest for vhost-user

 docs/specs/vhost-user.txt         |  261 ++++++++++++++++++++++++++++
 hmp-commands.hx                   |    4
 hw/net/vhost_net.c                |  228 +++++++++++++++++--------
 hw/net/virtio-net.c               |   29 +--
 hw/scsi/vhost-scsi.c              |   45 +++--
 hw/virtio/Makefile.objs           |    2
 hw/virtio/vhost-backend.c         |   71 ++++++++
 hw/virtio/vhost-user.c            |  342 +++++++++++++++++++++++++++++++++++++
 hw/virtio/vhost.c                 |   82 ++++++---
 include/hw/virtio/vhost-backend.h |   38 ++++
 include/hw/virtio/vhost.h         |   13 +
 include/net/vhost-user.h          |   17 ++
 include/net/vhost_net.h           |   11 +
 include/sysemu/char.h             |   44 +++++
 include/sysemu/kvm.h              |   11 +
 kvm-all.c                         |    4
 kvm-stub.c                        |    1
 net/Makefile.objs                 |    2
 net/clients.h                     |    3
 net/hub.c                         |    1
 net/net.c                         |   25 ++-
 net/tap.c                         |   18 ++
 net/vhost-user.c                  |  265 +++++++++++++++++++++++++++++
 qapi-schema.json                  |   19 ++
 qemu-char.c                       |  277 +++++++++++++++++++++++++++---
 qemu-options.hx                   |   18 ++
 stubs/Makefile.objs               |    8 +
 stubs/bdrv-commit-all.c           |    7 +
 stubs/chr-msmouse.c               |    7 +
 stubs/get-next-serial.c           |    3
 stubs/is-daemonized.c             |    7 +
 stubs/machine-init-done.c         |    6 +
 stubs/monitor-init.c              |    6 +
 stubs/notify-event.c              |    6 +
 stubs/vc-init.c                   |    7 +
 tests/Makefile                    |    4
 tests/vhost-user-test.c           |  312 ++++++++++++++++++++++++++++++++++
 37 files changed, 2011 insertions(+), 193 deletions(-)
 create mode 100644 docs/specs/vhost-user.txt
 create mode 100644 hw/virtio/vhost-backend.c
 create mode 100644 hw/virtio/vhost-user.c
 create mode 100644 include/hw/virtio/vhost-backend.h
 create mode 100644 include/net/vhost-user.h
 create mode 100644 net/vhost-user.c
 create mode 100644 stubs/bdrv-commit-all.c
 create mode 100644 stubs/chr-msmouse.c
 create mode 100644 stubs/get-next-serial.c
 create mode 100644 stubs/is-daemonized.c
 create mode 100644 stubs/machine-init-done.c
 create mode 100644 stubs/monitor-init.c
 create mode 100644 stubs/notify-event.c
 create mode 100644 stubs/vc-init.c
 create mode 100644 tests/vhost-user-test.c


Nikolay Nikolaev 

