[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [RFC] vhost-vdpa-net: add vhost-vdpa-net host device support
From: |
Longpeng (Mike, Cloud Infrastructure Service Product Dept.) |
Subject: |
RE: [RFC] vhost-vdpa-net: add vhost-vdpa-net host device support |
Date: |
Sat, 11 Dec 2021 05:23:03 +0000 |
> -----Original Message-----
> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Wednesday, December 8, 2021 2:27 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> <longpeng2@huawei.com>
> Cc: mst <mst@redhat.com>; Parav Pandit <parav@nvidia.com>; Yongji Xie
> <xieyongji@bytedance.com>; Stefan Hajnoczi <stefanha@redhat.com>; Stefano
> Garzarella <sgarzare@redhat.com>; Yechuan <yechuan@huawei.com>; Gonglei (Arei)
> <arei.gonglei@huawei.com>; qemu-devel <qemu-devel@nongnu.org>
> Subject: Re: [RFC] vhost-vdpa-net: add vhost-vdpa-net host device support
>
> On Wed, Dec 8, 2021 at 1:20 PM Longpeng(Mike) <longpeng2@huawei.com> wrote:
> >
> > From: Longpeng <longpeng2@huawei.com>
> >
> > Hi guys,
> >
> > This patch introduces vhost-vdpa-net device, which is inspired
> > by vhost-user-blk and the proposal of vhost-vdpa-blk device [1].
> >
> > I've tested this patch on Huawei's offload card:
> > ./x86_64-softmmu/qemu-system-x86_64 \
> > -device vhost-vdpa-net-pci,vdpa-dev=/dev/vhost-vdpa-0
> >
> > For virtio hardware offloading, the most important requirement for us
> > is to support live migration between offloading cards from different
> > vendors, the combination of netdev and virtio-net seems too heavy, we
> > prefer a lightweight way.
>
> Could you elaborate more on this? It's mainly the control path when
> using with netdev, and it provides a lot of other benefits:
>
> - decouple the transport specific stuff out of the vhost abstraction,
> mmio device is supported with 0 line of code
> - migration compatibility, reuse the migration stream that is already
> supported by Qemu virtio-net, this will allow migration among
> different vhost backends.
> - software mediation facility, not all the virtqueues are assigned to
> guests directly. One example is the virtio-net cvq, qemu may want to
> intercept and record the device state for migration. Reusing the
> current virtio-net codes simplifies a lot of codes.
> - transparent failover (in the future), the nic model can choose to
> switch between vhost backends etc.
>
We want to use the vdpa framework instead of the vfio-pci framework in
the virtio hardware offloading case, so maybe some of the benefits above
are not needed in our case. But we need to migrate between different
hardware, so I am not sure whether this approach would be harmful to the
requirement.
> >
> > Maybe we could support both in the future ?
>
> For the net, we need to figure out the advantages of this approach
> first. Note that we didn't have vhost-user-net-pci or vhost-pci in the
> past.
>
Why didn't support vhost-user-net-pci in history ? Because its control
path is much more complex than the block ?
> For the block, I will leave Stefan and Stefano to comment.
>
> > Such as:
> >
> > * Lightweight
> > Net: vhost-vdpa-net
> > Storage: vhost-vdpa-blk
> >
> > * Heavy but more powerful
> > Net: netdev + virtio-net + vhost-vdpa
> > Storage: bdrv + virtio-blk + vhost-vdpa
> >
> > [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg797569.html
> >
> > Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
> > ---
> > hw/net/meson.build | 1 +
> > hw/net/vhost-vdpa-net.c | 338
> +++++++++++++++++++++++++++++++++++++
> > hw/virtio/Kconfig | 5 +
> > hw/virtio/meson.build | 1 +
> > hw/virtio/vhost-vdpa-net-pci.c | 118 +++++++++++++
>
> I'd expect there's no device type specific code in this approach and
> any kind of vDPA devices could be used with a general pci device.
>
> Any reason for having net specific types here?
>
No, just because there already has the proposal of vhost-vdpa-blk, so I
developed the vhost-vdpa-net correspondingly.
I pretty agree with your suggestion. If feasible, likes vfio-pci, we don't
need to maintain the device type specific code in QEMU, what's more, it's
possible to support the live migration of different virtio hardware.
> > include/hw/virtio/vhost-vdpa-net.h | 31 ++++
> > include/net/vhost-vdpa.h | 2 +
> > net/vhost-vdpa.c | 2 +-
> > 8 files changed, 497 insertions(+), 1 deletion(-)
> > create mode 100644 hw/net/vhost-vdpa-net.c
> > create mode 100644 hw/virtio/vhost-vdpa-net-pci.c
> > create mode 100644 include/hw/virtio/vhost-vdpa-net.h
> >
> > diff --git a/hw/net/meson.build b/hw/net/meson.build
> > index bdf71f1..139ebc4 100644
> > --- a/hw/net/meson.build
> > +++ b/hw/net/meson.build
> > @@ -44,6 +44,7 @@ specific_ss.add(when: 'CONFIG_XILINX_ETHLITE', if_true:
> files('xilinx_ethlite.c'
> >
> > softmmu_ss.add(when: 'CONFIG_VIRTIO_NET', if_true: files('net_rx_pkt.c'))
> > specific_ss.add(when: 'CONFIG_VIRTIO_NET', if_true: files('virtio-net.c'))
> > +specific_ss.add(when: 'CONFIG_VHOST_VDPA_NET', if_true:
> files('vhost-vdpa-net.c'))
> >
> > softmmu_ss.add(when: ['CONFIG_VIRTIO_NET', 'CONFIG_VHOST_NET'], if_true:
> files('vhost_net.c'), if_false: files('vhost_net-stub.c'))
> > softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost_net-stub.c'))
> > diff --git a/hw/net/vhost-vdpa-net.c b/hw/net/vhost-vdpa-net.c
> > new file mode 100644
> > index 0000000..48b99f9
> > --- /dev/null
> > +++ b/hw/net/vhost-vdpa-net.c
> > @@ -0,0 +1,338 @@
> > +#include "qemu/osdep.h"
> > +#include "qapi/error.h"
> > +#include "qemu/error-report.h"
> > +#include "qemu/cutils.h"
> > +#include "hw/qdev-core.h"
> > +#include "hw/qdev-properties.h"
> > +#include "hw/qdev-properties-system.h"
> > +#include "hw/virtio/vhost.h"
> > +#include "hw/virtio/vhost-vdpa-net.h"
> > +#include "hw/virtio/virtio.h"
> > +#include "hw/virtio/virtio-bus.h"
> > +#include "hw/virtio/virtio-access.h"
> > +#include "sysemu/sysemu.h"
> > +#include "sysemu/runstate.h"
> > +#include "net/vhost-vdpa.h"
> > +
> > +static void vhost_vdpa_net_get_config(VirtIODevice *vdev, uint8_t *config)
> > +{
> > + VHostVdpaNet *s = VHOST_VDPA_NET(vdev);
> > +
> > + memcpy(config, &s->netcfg, sizeof(struct virtio_net_config));
> > +}
> > +
> > +static void vhost_vdpa_net_set_config(VirtIODevice *vdev, const uint8_t
> *config)
> > +{
> > + VHostVdpaNet *s = VHOST_VDPA_NET(vdev);
> > + struct virtio_net_config *netcfg = (struct virtio_net_config *)config;
> > + int ret;
> > +
> > + ret = vhost_dev_set_config(&s->dev, (uint8_t *)netcfg, 0,
> sizeof(*netcfg),
> > + VHOST_SET_CONFIG_TYPE_MASTER);
> > + if (ret) {
> > + error_report("set device config space failed");
> > + return;
> > + }
> > +}
> > +
> > +static uint64_t vhost_vdpa_net_get_features(VirtIODevice *vdev,
> > + uint64_t features,
> > + Error **errp)
> > +{
> > + VHostVdpaNet *s = VHOST_VDPA_NET(vdev);
> > +
> > + virtio_add_feature(&features, VIRTIO_NET_F_CSUM);
> > + virtio_add_feature(&features, VIRTIO_NET_F_GUEST_CSUM);
> > + virtio_add_feature(&features, VIRTIO_NET_F_MAC);
> > + virtio_add_feature(&features, VIRTIO_NET_F_GSO);
> > + virtio_add_feature(&features, VIRTIO_NET_F_GUEST_TSO4);
> > + virtio_add_feature(&features, VIRTIO_NET_F_GUEST_TSO6);
> > + virtio_add_feature(&features, VIRTIO_NET_F_GUEST_ECN);
> > + virtio_add_feature(&features, VIRTIO_NET_F_GUEST_UFO);
> > + virtio_add_feature(&features, VIRTIO_NET_F_GUEST_ANNOUNCE);
> > + virtio_add_feature(&features, VIRTIO_NET_F_HOST_TSO4);
> > + virtio_add_feature(&features, VIRTIO_NET_F_HOST_TSO6);
> > + virtio_add_feature(&features, VIRTIO_NET_F_HOST_ECN);
> > + virtio_add_feature(&features, VIRTIO_NET_F_HOST_UFO);
> > + virtio_add_feature(&features, VIRTIO_NET_F_MRG_RXBUF);
> > + virtio_add_feature(&features, VIRTIO_NET_F_STATUS);
> > + virtio_add_feature(&features, VIRTIO_NET_F_CTRL_VQ);
> > + virtio_add_feature(&features, VIRTIO_NET_F_CTRL_RX);
> > + virtio_add_feature(&features, VIRTIO_NET_F_CTRL_VLAN);
> > + virtio_add_feature(&features, VIRTIO_NET_F_CTRL_RX_EXTRA);
> > + virtio_add_feature(&features, VIRTIO_NET_F_CTRL_MAC_ADDR);
> > + virtio_add_feature(&features, VIRTIO_NET_F_MQ);
>
> Any reason for those hand crafted features?
>
> > +
> > + return vhost_get_features(&s->dev, vdpa_feature_bits, features);
> > +}
> > +
> > +static int vhost_vdpa_net_start(VirtIODevice *vdev, Error **errp)
> > +{
> > + VHostVdpaNet *s = VHOST_VDPA_NET(vdev);
> > + BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> > + VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> > + int i, ret;
> > +
> > + if (!k->set_guest_notifiers) {
> > + error_setg(errp, "binding does not support guest notifiers");
> > + return -ENOSYS;
> > + }
> > +
> > + ret = vhost_dev_enable_notifiers(&s->dev, vdev);
> > + if (ret < 0) {
> > + error_setg_errno(errp, -ret, "Error enabling host notifiers");
> > + return ret;
> > + }
> > +
> > + ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
> > + if (ret < 0) {
> > + error_setg_errno(errp, -ret, "Error binding guest notifier");
> > + goto err_host_notifiers;
> > + }
> > +
> > + s->dev.acked_features = vdev->guest_features;
> > +
> > + ret = vhost_dev_start(&s->dev, vdev);
> > + if (ret < 0) {
> > + error_setg_errno(errp, -ret, "Error starting vhost");
> > + goto err_guest_notifiers;
> > + }
> > + s->started = true;
> > +
> > + /* guest_notifier_mask/pending not used yet, so just unmask
> > + * everything here. virtio-pci will do the right thing by
> > + * enabling/disabling irqfd.
> > + */
> > + for (i = 0; i < s->dev.nvqs; i++) {
> > + vhost_virtqueue_mask(&s->dev, vdev, i, false);
> > + }
> > +
> > + return ret;
> > +
> > +err_guest_notifiers:
> > + k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
> > +err_host_notifiers:
> > + vhost_dev_disable_notifiers(&s->dev, vdev);
> > + return ret;
> > +}
> > +
> > +static void vhost_vdpa_net_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > + VHostVdpaNet *s = VHOST_VDPA_NET(vdev);
> > + Error *local_err = NULL;
> > + int i, ret;
> > +
> > + if (!vdev->start_on_kick) {
> > + return;
> > + }
> > +
> > + if (s->dev.started) {
> > + return;
> > + }
> > +
> > + /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
> > + * vhost here instead of waiting for .set_status().
> > + */
> > + ret = vhost_vdpa_net_start(vdev, &local_err);
> > + if (ret < 0) {
> > + error_reportf_err(local_err, "vhost-vdpa-net: start failed: ");
> > + return;
> > + }
> > +
> > + /* Kick right away to begin processing requests already in vring */
> > + for (i = 0; i < s->dev.nvqs; i++) {
> > + VirtQueue *kick_vq = virtio_get_queue(vdev, i);
> > +
> > + if (!virtio_queue_get_desc_addr(vdev, i)) {
> > + continue;
> > + }
> > + event_notifier_set(virtio_queue_get_host_notifier(kick_vq));
> > + }
> > +}
> > +
> > +static void vhost_vdpa_net_stop(VirtIODevice *vdev)
> > +{
> > + VHostVdpaNet *s = VHOST_VDPA_NET(vdev);
> > + BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> > + VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> > + int ret;
> > +
> > + if (!s->started) {
> > + return;
> > + }
> > + s->started = false;
> > +
> > + if (!k->set_guest_notifiers) {
> > + return;
> > + }
> > +
> > + vhost_dev_stop(&s->dev, vdev);
> > +
> > + ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
> > + if (ret < 0) {
> > + error_report("vhost guest notifier cleanup failed: %d", ret);
> > + return;
> > + }
> > +
> > + vhost_dev_disable_notifiers(&s->dev, vdev);
> > +}
> > +
> > +static void vhost_vdpa_net_set_status(VirtIODevice *vdev, uint8_t status)
> > +{
> > + VHostVdpaNet *s = VHOST_VDPA_NET(vdev);
> > + bool should_start = virtio_device_started(vdev, status);
> > + Error *local_err = NULL;
> > + int ret;
> > +
> > + if (!vdev->vm_running) {
> > + should_start = false;
> > + }
> > +
> > + if (s->started == should_start) {
> > + return;
> > + }
> > +
> > + if (should_start) {
> > + ret = vhost_vdpa_net_start(vdev, &local_err);
> > + if (ret < 0) {
> > + error_reportf_err(local_err, "vhost-vdpa-net: start failed: ");
> > + }
> > + } else {
> > + vhost_vdpa_net_stop(vdev);
> > + }
> > +}
> > +
> > +static void vhost_vdpa_net_unrealize(VHostVdpaNet *s)
> > +{
> > + VirtIODevice *vdev = VIRTIO_DEVICE(s);
> > + int i;
> > +
> > + for (i = 0; i < s->queue_pairs * 2; i++) {
> > + virtio_delete_queue(s->virtqs[i]);
> > + }
> > + /* ctrl vq */
> > + virtio_delete_queue(s->virtqs[i]);
> > +
> > + g_free(s->virtqs);
> > + virtio_cleanup(vdev);
> > +}
> > +
> > +static void vhost_vdpa_net_device_realize(DeviceState *dev, Error **errp)
> > +{
> > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > + VHostVdpaNet *s = VHOST_VDPA_NET(vdev);
> > + int i, ret;
> > +
> > + s->vdpa.device_fd = qemu_open_old(s->vdpa_dev, O_RDWR);
> > + if (s->vdpa.device_fd == -1) {
> > + error_setg(errp, "vhost-vdpa-net: open %s failed: %s",
> > + s->vdpa_dev, strerror(errno));
> > + return;
> > + }
> > +
> > + virtio_init(vdev, "virtio-net", VIRTIO_ID_NET,
> > + sizeof(struct virtio_net_config));
> > +
> > + s->dev.nvqs = s->queue_pairs * 2 + 1;
> > + s->dev.vqs = g_new0(struct vhost_virtqueue, s->dev.nvqs);
> > + s->dev.vq_index = 0;
> > + s->dev.vq_index_end = s->dev.nvqs;
> > + s->dev.backend_features = 0;
> > + s->started = false;
> > +
> > + s->virtqs = g_new0(VirtQueue *, s->dev.nvqs);
> > + for (i = 0; i < s->dev.nvqs; i++) {
> > + s->virtqs[i] = virtio_add_queue(vdev, s->queue_size,
> > + vhost_vdpa_net_handle_output);
>
> We should check whether MQ is negotiated since the index varies
> depending on that.
>
> > + }
> > +
> > + ret = vhost_dev_init(&s->dev, &s->vdpa, VHOST_BACKEND_TYPE_VDPA, 0,
> NULL);
> > + if (ret < 0) {
> > + error_setg(errp, "vhost-vdpa-net: vhost initialization failed: %s",
> > + strerror(-ret));
> > + goto init_err;
> > + }
> > +
> > + ret = vhost_dev_get_config(&s->dev, (uint8_t *)&s->netcfg,
> > + sizeof(struct virtio_net_config), NULL);
> > + if (ret < 0) {
> > + error_setg(errp, "vhost-vdpa-net: get network config failed");
> > + goto config_err;
> > + }
> > +
> > + return;
> > +config_err:
> > + vhost_dev_cleanup(&s->dev);
> > +init_err:
> > + vhost_vdpa_net_unrealize(s);
> > + close(s->vdpa.device_fd);
> > +}
> > +
> > +static void vhost_vdpa_net_device_unrealize(DeviceState *dev)
> > +{
> > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > + VHostVdpaNet *s = VHOST_VDPA_NET(vdev);
> > +
> > + virtio_set_status(vdev, 0);
> > + vhost_dev_cleanup(&s->dev);
> > + vhost_vdpa_net_unrealize(s);
> > + close(s->vdpa.device_fd);
> > +}
> > +
> > +static const VMStateDescription vmstate_vhost_vdpa_net = {
> > + .name = "vhost-vdpa-net",
> > + .minimum_version_id = 1,
> > + .version_id = 1,
> > + .fields = (VMStateField[]) {
> > + VMSTATE_VIRTIO_DEVICE,
> > + VMSTATE_END_OF_LIST()
> > + },
> > +};
> > +
> > +static void vhost_vdpa_net_instance_init(Object *obj)
> > +{
> > + VHostVdpaNet *s = VHOST_VDPA_NET(obj);
> > +
> > + device_add_bootindex_property(obj, &s->bootindex, "bootindex",
> > + "/ethernet-phy@0,0", DEVICE(obj));
> > +}
> > +
> > +static Property vhost_vdpa_net_properties[] = {
> > + DEFINE_PROP_STRING("vdpa-dev", VHostVdpaNet, vdpa_dev),
> > + DEFINE_PROP_UINT16("queue-pairs", VHostVdpaNet, queue_pairs,
> > + VHOST_VDPA_NET_AUTO_QUEUE_PAIRS),
>
> Any reason that we need the queue pairs parameter? Note that it is
> expected to be provisioned by the netlink for the management device.
>
> Thanks
- [RFC] vhost-vdpa-net: add vhost-vdpa-net host device support, Longpeng(Mike), 2021/12/08
- Re: [RFC] vhost-vdpa-net: add vhost-vdpa-net host device support, Michael S. Tsirkin, 2021/12/08
- Re: [RFC] vhost-vdpa-net: add vhost-vdpa-net host device support, Stefan Hajnoczi, 2021/12/09
- Re: [RFC] vhost-vdpa-net: add vhost-vdpa-net host device support, Stefano Garzarella, 2021/12/09
- RE: [RFC] vhost-vdpa-net: add vhost-vdpa-net host device support, Longpeng (Mike, Cloud Infrastructure Service Product Dept.), 2021/12/10
- Re: [RFC] vhost-vdpa-net: add vhost-vdpa-net host device support, Stefano Garzarella, 2021/12/13
- Re: [RFC] vhost-vdpa-net: add vhost-vdpa-net host device support, Stefan Hajnoczi, 2021/12/13
- RE: [RFC] vhost-vdpa-net: add vhost-vdpa-net host device support, Longpeng (Mike, Cloud Infrastructure Service Product Dept.), 2021/12/13