qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [PATCH 6/7] vhost-user: add multiple queue support


From: Yuanhan Liu
Subject: [Qemu-devel] [PATCH 6/7] vhost-user: add multiple queue support
Date: Tue, 8 Sep 2015 15:38:46 +0800

From: Ouyang Changchun <address@hidden>

This patch is initially based a patch from Nikolay Nikolaev.

Here is the latest version for adding vhost-user multiple queue support,
by creating a nc and vhost_net pair for each queue.

What differs from last version is that this patch addresses two major
concerns from Michael, and fixes one hidden bug.

- Concern #1: no feedback when the backend can't support # of
  requested queues(by providing queues=# option).

  Here we address this issue by querying VHOST_USER_PROTOCOL_F_MQ
  protocol features first, if not set, it means the backend don't
  support mq feature, and let max_queues be 1. Otherwise, we send
  another message, VHOST_USER_GET_QUEUE_NUM, for getting the max_queues
  the backend supports.

  At vhost-user initiation stage(net_vhost_user_start), we then initiate
  one queue first, which, in the meantime, also gets the max_queues.
  We then do a simple compare: if requested_queues > max_queues, we
  exit(I guess it's safe to exit here, as the vm is not running yet).

- Concern #2: some messages are sent more times than necessary.

  We came an agreement with Michael that we could categorize vhost
  user messages to 2 types: none-vring specific messages, which should
  be sent only once, and vring specific messages, which should be sent
  per queue.

  Here I introduced a helper function vhost_user_one_time_request(),
  which lists following messages as none-vring specific messages:

        VHOST_USER_GET_FEATURES
        VHOST_USER_SET_FEATURES
        VHOST_USER_GET_PROTOCOL_FEATURES
        VHOST_USER_SET_PROTOCOL_FEATURES
        VHOST_USER_SET_OWNER
        VHOST_USER_RESET_DEVICE
        VHOST_USER_SET_MEM_TABLE
        VHOST_USER_GET_QUEUE_NUM

  For above messages, we simply ignore them when they are not sent the first
  time.

I also observed a hidden bug from last version. We register the char dev
event handler N times, which is not necessary, as well as buggy: A later
register overwrites the former one, as qemu_chr_add_handlers() will not
chain those handlers, but instead overwrites the old one. So, in theory,
invoking qemu_chr_add_handlers N times will not end up with calling the
handler N times.

However, the reason the handler is invoked N times is because we start
the backend(the connection server) first, and hence when net_vhost_user_init()
is executed, the connection is already established, and qemu_chr_add_handlers()
then invoke the handler immediately, which just looks like we invoke the
handler(net_vhost_user_event) directly from net_vhost_user_init().

The solution I came up with is to make VhostUserState as an upper level
structure, making it includes N nc and vhost_net pairs:

   struct VhostUserNetPeer {
       NetClientState *nc;
       VHostNetState  *vhost_net;
   };

   typedef struct VhostUserState {
       CharDriverState *chr;
       bool running;
       int queues;
       struct VhostUserNetPeer peers[];
   } VhostUserState;

Signed-off-by: Nikolay Nikolaev <address@hidden>
Signed-off-by: Changchun Ouyang <address@hidden>
Signed-off-by: Yuanhan Liu <address@hidden>
---
 docs/specs/vhost-user.txt |  13 +++++
 hw/virtio/vhost-user.c    |  31 ++++++++++-
 include/net/net.h         |   1 +
 net/vhost-user.c          | 136 ++++++++++++++++++++++++++++++++--------------
 qapi-schema.json          |   6 +-
 qemu-options.hx           |   5 +-
 6 files changed, 146 insertions(+), 46 deletions(-)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 43db9b4..99d79be 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -135,6 +135,19 @@ As older slaves don't support negotiating protocol 
features,
 a feature bit was dedicated for this purpose:
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
 
+Multiple queue support
+-------------------
+Multiple queue is treated as a protocal extension, hence the slave has to
+implement protocol features first. Multiple queues is supported only when
+the protocol feature VHOST_USER_PROTOCOL_F_MQ(bit 0) is set.
+
+The max # of queues the slave support can be queried with message
+VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the # of requested
+queues is bigger than that.
+
+As all queues share one connection, the master use a unique index for each
+queue in the sent message to identify one specified queue.
+
 Message types
 -------------
 
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 8046bc0..11e46b5 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -187,6 +187,23 @@ static int vhost_user_write(struct vhost_dev *dev, 
VhostUserMsg *msg,
             0 : -1;
 }
 
+static bool vhost_user_one_time_request(VhostUserRequest request)
+{
+    switch (request) {
+    case VHOST_USER_GET_FEATURES:
+    case VHOST_USER_SET_FEATURES:
+    case VHOST_USER_GET_PROTOCOL_FEATURES:
+    case VHOST_USER_SET_PROTOCOL_FEATURES:
+    case VHOST_USER_SET_OWNER:
+    case VHOST_USER_RESET_DEVICE:
+    case VHOST_USER_SET_MEM_TABLE:
+    case VHOST_USER_GET_QUEUE_NUM:
+        return true;
+    default:
+        return false;
+    }
+}
+
 static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
         void *arg)
 {
@@ -206,6 +223,14 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned 
long int request,
     else
         msg_request = request;
 
+    /*
+     * For none-vring specific requests, like VHOST_USER_GET_FEATURES,
+     * we just need send it once in the first time. For later such
+     * request, we just ignore it.
+     */
+    if (vhost_user_one_time_request(msg_request) && dev->vq_index != 0)
+        return 0;
+
     msg.request = msg_request;
     msg.flags = VHOST_USER_VERSION;
     msg.size = 0;
@@ -268,17 +293,20 @@ static int vhost_user_call(struct vhost_dev *dev, 
unsigned long int request,
     case VHOST_USER_SET_VRING_NUM:
     case VHOST_USER_SET_VRING_BASE:
         memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
+        msg.addr.index += dev->vq_index;
         msg.size = sizeof(m.state);
         break;
 
     case VHOST_USER_GET_VRING_BASE:
         memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
+        msg.addr.index += dev->vq_index;
         msg.size = sizeof(m.state);
         need_reply = 1;
         break;
 
     case VHOST_USER_SET_VRING_ADDR:
         memcpy(&msg.addr, arg, sizeof(struct vhost_vring_addr));
+        msg.addr.index += dev->vq_index;
         msg.size = sizeof(m.addr);
         break;
 
@@ -286,7 +314,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned 
long int request,
     case VHOST_USER_SET_VRING_CALL:
     case VHOST_USER_SET_VRING_ERR:
         file = arg;
-        msg.u64 = file->index & VHOST_USER_VRING_IDX_MASK;
+        msg.u64 = (file->index + dev->vq_index) & VHOST_USER_VRING_IDX_MASK;
         msg.size = sizeof(m.u64);
         if (ioeventfd_enabled() && file->fd > 0) {
             fds[fd_num++] = file->fd;
@@ -330,6 +358,7 @@ static int vhost_user_call(struct vhost_dev *dev, unsigned 
long int request,
                 error_report("Received bad msg size.");
                 return -1;
             }
+            msg.state.index -= dev->vq_index;
             memcpy(arg, &msg.state, sizeof(struct vhost_vring_state));
             break;
         default:
diff --git a/include/net/net.h b/include/net/net.h
index 6a6cbef..6f20656 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -92,6 +92,7 @@ struct NetClientState {
     NetClientDestructor *destructor;
     unsigned int queue_index;
     unsigned rxfilter_notify_enabled:1;
+    void *opaque;
 };
 
 typedef struct NICState {
diff --git a/net/vhost-user.c b/net/vhost-user.c
index 2d6bbe5..7d4ac69 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -15,10 +15,16 @@
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
 
+struct VhostUserNetPeer {
+    NetClientState *nc;
+    VHostNetState  *vhost_net;
+};
+
 typedef struct VhostUserState {
-    NetClientState nc;
     CharDriverState *chr;
-    VHostNetState *vhost_net;
+    bool running;
+    int queues;
+    struct VhostUserNetPeer peers[];
 } VhostUserState;
 
 typedef struct VhostUserChardevProps {
@@ -29,48 +35,75 @@ typedef struct VhostUserChardevProps {
 
 VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
 {
-    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+    VhostUserState *s = nc->opaque;
     assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
-    return s->vhost_net;
-}
-
-static int vhost_user_running(VhostUserState *s)
-{
-    return (s->vhost_net) ? 1 : 0;
+    return s->peers[nc->queue_index].vhost_net;
 }
 
 static int vhost_user_start(VhostUserState *s)
 {
     VhostNetOptions options;
+    VHostNetState *vhost_net;
+    int max_queues;
+    int i = 0;
 
-    if (vhost_user_running(s)) {
+    if (s->running)
         return 0;
-    }
 
     options.backend_type = VHOST_BACKEND_TYPE_USER;
-    options.net_backend = &s->nc;
     options.opaque = s->chr;
 
-    s->vhost_net = vhost_net_init(&options);
+    options.net_backend = s->peers[i].nc;
+    vhost_net = s->peers[i++].vhost_net = vhost_net_init(&options);
+
+    max_queues = vhost_net_get_max_queues(vhost_net);
+    if (s->queues >= max_queues) {
+        error_report("you are asking more queues than supported: %d\n",
+                     max_queues);
+        return -1;
+    }
+
+    for (; i < s->queues; i++) {
+        options.net_backend = s->peers[i].nc;
+
+        s->peers[i].vhost_net = vhost_net_init(&options);
+        if (!s->peers[i].vhost_net)
+            return -1;
+    }
+    s->running = true;
 
-    return vhost_user_running(s) ? 0 : -1;
+    return 0;
 }
 
 static void vhost_user_stop(VhostUserState *s)
 {
-    if (vhost_user_running(s)) {
-        vhost_net_cleanup(s->vhost_net);
+    int i;
+    VHostNetState *vhost_net;
+
+    if (!s->running)
+        return;
+
+    for (i = 0;  i < s->queues; i++) {
+        vhost_net = s->peers[i].vhost_net;
+        if (vhost_net)
+            vhost_net_cleanup(vhost_net);
     }
 
-    s->vhost_net = 0;
+    s->running = false;
 }
 
 static void vhost_user_cleanup(NetClientState *nc)
 {
-    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+    VhostUserState *s = nc->opaque;
+    VHostNetState *vhost_net = s->peers[nc->queue_index].vhost_net;
+
+    if (vhost_net)
+        vhost_net_cleanup(vhost_net);
 
-    vhost_user_stop(s);
     qemu_purge_queued_packets(nc);
+
+    if (nc->queue_index == s->queues - 1)
+        free(s);
 }
 
 static bool vhost_user_has_vnet_hdr(NetClientState *nc)
@@ -89,7 +122,7 @@ static bool vhost_user_has_ufo(NetClientState *nc)
 
 static NetClientInfo net_vhost_user_info = {
         .type = NET_CLIENT_OPTIONS_KIND_VHOST_USER,
-        .size = sizeof(VhostUserState),
+        .size = sizeof(NetClientState),
         .cleanup = vhost_user_cleanup,
         .has_vnet_hdr = vhost_user_has_vnet_hdr,
         .has_ufo = vhost_user_has_ufo,
@@ -97,18 +130,25 @@ static NetClientInfo net_vhost_user_info = {
 
 static void net_vhost_link_down(VhostUserState *s, bool link_down)
 {
-    s->nc.link_down = link_down;
+    NetClientState *nc;
+    int i;
 
-    if (s->nc.peer) {
-        s->nc.peer->link_down = link_down;
-    }
+    for (i = 0; i < s->queues; i++) {
+        nc = s->peers[i].nc;
 
-    if (s->nc.info->link_status_changed) {
-        s->nc.info->link_status_changed(&s->nc);
-    }
+        nc->link_down = link_down;
+
+        if (nc->peer) {
+            nc->peer->link_down = link_down;
+        }
 
-    if (s->nc.peer && s->nc.peer->info->link_status_changed) {
-        s->nc.peer->info->link_status_changed(s->nc.peer);
+        if (nc->info->link_status_changed) {
+            nc->info->link_status_changed(nc);
+        }
+
+        if (nc->peer && nc->peer->info->link_status_changed) {
+            nc->peer->info->link_status_changed(nc->peer);
+        }
     }
 }
 
@@ -118,7 +158,8 @@ static void net_vhost_user_event(void *opaque, int event)
 
     switch (event) {
     case CHR_EVENT_OPENED:
-        vhost_user_start(s);
+        if (vhost_user_start(s) < 0)
+            exit(1);
         net_vhost_link_down(s, false);
         error_report("chardev \"%s\" went up", s->chr->label);
         break;
@@ -131,24 +172,28 @@ static void net_vhost_user_event(void *opaque, int event)
 }
 
 static int net_vhost_user_init(NetClientState *peer, const char *device,
-                               const char *name, CharDriverState *chr)
+                               const char *name, VhostUserState *s)
 {
     NetClientState *nc;
-    VhostUserState *s;
+    CharDriverState *chr = s->chr;
+    int i;
 
-    nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
+    for (i = 0; i < s->queues; i++) {
+        nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
 
-    snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user to %s",
-             chr->label);
+        snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to %s",
+                 i, chr->label);
 
-    s = DO_UPCAST(VhostUserState, nc, nc);
+        /* We don't provide a receive callback */
+        nc->receive_disabled = 1;
 
-    /* We don't provide a receive callback */
-    s->nc.receive_disabled = 1;
-    s->chr = chr;
-    nc->queue_index = 0;
+        nc->queue_index = i;
+        nc->opaque      = s;
 
-    qemu_chr_add_handlers(s->chr, NULL, NULL, net_vhost_user_event, s);
+        s->peers[i].nc = nc;
+    }
+
+    qemu_chr_add_handlers(chr, NULL, NULL, net_vhost_user_event, s);
 
     return 0;
 }
@@ -227,8 +272,10 @@ static int net_vhost_check_net(void *opaque, QemuOpts 
*opts, Error **errp)
 int net_init_vhost_user(const NetClientOptions *opts, const char *name,
                         NetClientState *peer, Error **errp)
 {
+    int queues;
     const NetdevVhostUserOptions *vhost_user_opts;
     CharDriverState *chr;
+    VhostUserState *s;
 
     assert(opts->kind == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
     vhost_user_opts = opts->vhost_user;
@@ -244,6 +291,11 @@ int net_init_vhost_user(const NetClientOptions *opts, 
const char *name,
         return -1;
     }
 
+    queues = vhost_user_opts->has_queues ? vhost_user_opts->queues : 1;
+    s = g_malloc0(sizeof(VhostUserState) +
+                  queues * sizeof(struct VhostUserNetPeer));
+    s->queues = queues;
+    s->chr    = chr;
 
-    return net_vhost_user_init(peer, "vhost_user", name, chr);
+    return net_vhost_user_init(peer, "vhost_user", name, s);
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index 67fef37..55c33db 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2480,12 +2480,16 @@
 #
 # @vhostforce: #optional vhost on for non-MSIX virtio guests (default: false).
 #
+# @queues: #optional number of queues to be created for multiqueue vhost-user
+#          (default: 1) (Since 2.5)
+#
 # Since 2.1
 ##
 { 'struct': 'NetdevVhostUserOptions',
   'data': {
     'chardev':        'str',
-    '*vhostforce':    'bool' } }
+    '*vhostforce':    'bool',
+    '*queues':        'int' } }
 
 ##
 # @NetClientOptions
diff --git a/qemu-options.hx b/qemu-options.hx
index 77f5853..5bfa7a3 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1963,13 +1963,14 @@ The hubport netdev lets you connect a NIC to a QEMU 
"vlan" instead of a single
 netdev.  @code{-net} and @code{-device} with parameter @option{vlan} create the
 required hub automatically.
 
address@hidden -netdev vhost-user,address@hidden,vhostforce=on|off]
address@hidden -netdev vhost-user,address@hidden,vhostforce=on|off][,queues=n]
 
 Establish a vhost-user netdev, backed by a chardev @var{id}. The chardev should
 be a unix domain socket backed one. The vhost-user uses a specifically defined
 protocol to pass vhost ioctl replacement messages to an application on the 
other
 end of the socket. On non-MSIX guests, the feature can be forced with
address@hidden
address@hidden Use 'address@hidden' to specify the number of queues to
+be created for multiqueue vhost-user.
 
 Example:
 @example
-- 
1.9.0




reply via email to

[Prev in Thread] Current Thread [Next in Thread]