qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol f


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Date: Thu, 12 Apr 2018 06:35:25 +0300

On Thu, Apr 12, 2018 at 06:20:55AM +0300, Michael S. Tsirkin wrote:
> On Thu, Apr 12, 2018 at 10:35:05AM +0800, Tiwei Bie wrote:
> > On Thu, Apr 12, 2018 at 04:57:13AM +0300, Michael S. Tsirkin wrote:
> > > On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote:
> > > > On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote:
> > > > > On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote:
> > > > > > On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> > > > > > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > > > > > > feature for vhost-user. By default, vhost-user backend needs
> > > > > > > > to query the IOTLBs from QEMU after meeting unknown IOVAs.
> > > > > > > > With this protocol feature negotiated, QEMU will provide all
> > > > > > > > the IOTLBs to vhost-user backend without waiting for the
> > > > > > > > queries from backend. This is helpful when using a hardware
> > > > > > > > accelerator which is not able to handle unknown IOVAs at the
> > > > > > > > vhost-user backend.
> > > > > > > > 
> > > > > > > > Signed-off-by: Tiwei Bie <address@hidden>
> > > > > > > 
> > > > > > > This is potentially a large amount of data to be sent
> > > > > > > on a socket.
> > > > > > 
> > > > > > If we take the hardware accelerator out of this picture, we
> > > > > > will find that it's actually a question of "pre-loading" vs
> > > > > > "lazy-loading". I think neither of them is perfect.
> > > > > > 
> > > > > > For "pre-loading", as you said, we may have a tough starting.
> > > > > > But for "lazy-loading", we can't have a predictable performance.
> > > > > > A sudden, unexpected performance drop may happen at any time,
> > > > > > because we may meet an unknown IOVA at any time in this case.
> > > > > 
> > > > > That's how hardware behaves too though. So we can expect guests
> > > > > to try to optimize locality.
> > > > 
> > > > The difference is that, the software implementation needs to
> > > > query the mappings via socket. And it's much slower..
> > > 
> > > If you are proposing this new feature as an optimization,
> > > then I'd like to see numbers showing the performance gains.
> > > 
> > > It's definitely possible to optimize things out.  Pre-loading isn't
> > > where I would start optimizing though.  For example, DPDK could have its
> > > own VTD emulation, then it could access guest memory directly.
> > > 
> > > 
> > > > > 
> > > > > > Once we meet an unknown IOVA, the backend's data path will need
> > > > > > to stop and query the mapping of the IOVA via the socket and
> > > > > > wait for the reply. And the latency is not negligible (sometimes
> > > > > > it's even unacceptable), especially in high performance network
> > > > > > case. So maybe it's better to make both of them available to
> > > > > > the vhost backend.
> > > > > > 
> > > > > > > 
> > > > > > > I had an impression that a hardware accelerator was using
> > > > > > > VFIO anyway. Given this, can't we have QEMU program
> > > > > > > the shadow IOMMU tables into VFIO directly?
> > > > > > 
> > > > > > I think it's a good idea! Currently, my concern about it is
> > > > > > that, the hardware device may not use IOMMU and it may have
> > > > > > its builtin address translation unit. And it will be a pain
> > > > > > for device vendors to teach VFIO to be able to work with the
> > > > > > builtin address translation unit.
> > > > > 
> > > > > I think such drivers would have to interate with VFIO somehow.
> > > > > Otherwise, what is the plan for assigning such devices then?
> > > > 
> > > > Such devices are just for vhost data path acceleration.
> > > 
> > > That's not true I think.  E.g. RDMA devices have an on-card MMU.
> > > 
> > > > They have many available queue pairs, the switch logic
> > > > will be done among those queue pairs. And different queue
> > > > pairs will serve different VMs directly.
> > > > 
> > > > Best regards,
> > > > Tiwei Bie
> > > 
> > > The way I would do it is attach different PASID values to
> > > different queues. This way you can use the standard IOMMU
> > > to enforce protection.
> > 
> > I'm thinking about the case that device wants to use its
> > builtin address translation, although I'm not really sure
> > whether there will be a real product work in this way.
> > 
> > Honestly, I like your idea, and I don't object to it. I'll
> > do more investigations on it. And for the feature proposed
> > in this RFC, I just think maybe it's better to provide one
> > more possibility for the backend to support vIOMMU.
> > 
> > Anyway, the work about adding the vIOMMU support in vDPA is
> > just started few days ago. I'll do more investigations on
> > each possibility. Thanks! :)
> > 
> > Best regards,
> > Tiwei Bie
> 
> There are more advantages to using request with PASID:
> 
> You can use hardware support for nesting, having guest supply 1st level
> translation and host second level translation.
> 
> I actually had an idea to do something like this for AMD
> and ARM which support nesting even for requests with PASID,
> having intel benefit too would be nice.

Something else to consider is implementing PRS capability.


In theory this could then go like this:

- get page request from device
- fetch request from VTD page tables
- use response to issue a page response message


This would match the current vhost-user model.





> 
> 
> > > 
> > > 
> > > > > 
> > > > > 
> > > > > > Best regards,
> > > > > > Tiwei Bie
> > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > ---
> > > > > > > > The idea of this patch is to let QEMU push all the IOTLBs
> > > > > > > > to vhost-user backend without waiting for the queries from
> > > > > > > > the backend. Because hardware accelerator at the vhost-user
> > > > > > > > backend may not be able to handle unknown IOVAs.
> > > > > > > > 
> > > > > > > > This is just a RFC for now. It seems that, it doesn't work
> > > > > > > > as expected when guest is using kernel driver (To handle
> > > > > > > > this case, it seems that some RAM regions' events also need
> > > > > > > > to be listened). Any comments would be appreciated! Thanks!
> > > > > > > > 
> > > > > > > >  docs/interop/vhost-user.txt       |  9 ++++++++
> > > > > > > >  hw/virtio/vhost-backend.c         |  7 ++++++
> > > > > > > >  hw/virtio/vhost-user.c            |  8 +++++++
> > > > > > > >  hw/virtio/vhost.c                 | 47 
> > > > > > > > ++++++++++++++++++++++++++++++++++++---
> > > > > > > >  include/hw/virtio/vhost-backend.h |  3 +++
> > > > > > > >  5 files changed, 71 insertions(+), 3 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/docs/interop/vhost-user.txt 
> > > > > > > > b/docs/interop/vhost-user.txt
> > > > > > > > index 534caab18a..73e07f9dad 100644
> > > > > > > > --- a/docs/interop/vhost-user.txt
> > > > > > > > +++ b/docs/interop/vhost-user.txt
> > > > > > > > @@ -380,6 +380,7 @@ Protocol features
> > > > > > > >  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
> > > > > > > >  #define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
> > > > > > > >  #define VHOST_USER_PROTOCOL_F_CONFIG         9
> > > > > > > > +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10
> > > > > > > >  
> > > > > > > >  Master message types
> > > > > > > >  --------------------
> > > > > > > > @@ -797,3 +798,11 @@ resilient for selective requests.
> > > > > > > >  For the message types that already solicit a reply from the 
> > > > > > > > client, the
> > > > > > > >  presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit 
> > > > > > > > being set brings
> > > > > > > >  no behavioural change. (See the 'Communication' section for 
> > > > > > > > details.)
> > > > > > > > +
> > > > > > > > +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
> > > > > > > > +------------------------------------
> > > > > > > > +By default, vhost-user backend needs to query the IOTLBs from 
> > > > > > > > QEMU after
> > > > > > > > +meeting unknown IOVAs. With this protocol feature negotiated, 
> > > > > > > > QEMU will
> > > > > > > > +provide all the IOTLBs to vhost backend without waiting for 
> > > > > > > > the queries
> > > > > > > > +from backend. This is helpful when using a hardware 
> > > > > > > > accelerator which is
> > > > > > > > +not able to handle unknown IOVAs at the vhost-user backend.
> > > > > > > > diff --git a/hw/virtio/vhost-backend.c 
> > > > > > > > b/hw/virtio/vhost-backend.c
> > > > > > > > index 7f09efab8b..d72994e9a5 100644
> > > > > > > > --- a/hw/virtio/vhost-backend.c
> > > > > > > > +++ b/hw/virtio/vhost-backend.c
> > > > > > > > @@ -233,6 +233,11 @@ static void 
> > > > > > > > vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
> > > > > > > >          qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, 
> > > > > > > > NULL, NULL);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > +static bool vhost_kernel_need_all_device_iotlb(struct 
> > > > > > > > vhost_dev *dev)
> > > > > > > > +{
> > > > > > > > +    return false;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  static const VhostOps kernel_ops = {
> > > > > > > >          .backend_type = VHOST_BACKEND_TYPE_KERNEL,
> > > > > > > >          .vhost_backend_init = vhost_kernel_init,
> > > > > > > > @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = {
> > > > > > > >  #endif /* CONFIG_VHOST_VSOCK */
> > > > > > > >          .vhost_set_iotlb_callback = 
> > > > > > > > vhost_kernel_set_iotlb_callback,
> > > > > > > >          .vhost_send_device_iotlb_msg = 
> > > > > > > > vhost_kernel_send_device_iotlb_msg,
> > > > > > > > +        .vhost_backend_need_all_device_iotlb =
> > > > > > > > +                                
> > > > > > > > vhost_kernel_need_all_device_iotlb,
> > > > > > > >  };
> > > > > > > >  
> > > > > > > >  int vhost_set_backend_type(struct vhost_dev *dev, 
> > > > > > > > VhostBackendType backend_type)
> > > > > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > > > > > > > index 38da8692bb..30e0b1c13b 100644
> > > > > > > > --- a/hw/virtio/vhost-user.c
> > > > > > > > +++ b/hw/virtio/vhost-user.c
> > > > > > > > @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature {
> > > > > > > >      VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
> > > > > > > >      VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
> > > > > > > >      VHOST_USER_PROTOCOL_F_CONFIG = 9,
> > > > > > > > +    VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10,
> > > > > > > >      VHOST_USER_PROTOCOL_F_MAX
> > > > > > > >  };
> > > > > > > >  
> > > > > > > > @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct 
> > > > > > > > vhost_dev *dev, uint64_t session_id)
> > > > > > > >      return 0;
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > +static bool vhost_user_need_all_device_iotlb(struct vhost_dev 
> > > > > > > > *dev)
> > > > > > > > +{
> > > > > > > > +    return virtio_has_feature(dev->protocol_features,
> > > > > > > > +                              
> > > > > > > > VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  const VhostOps user_ops = {
> > > > > > > >          .backend_type = VHOST_BACKEND_TYPE_USER,
> > > > > > > >          .vhost_backend_init = vhost_user_init,
> > > > > > > > @@ -1611,4 +1618,5 @@ const VhostOps user_ops = {
> > > > > > > >          .vhost_set_config = vhost_user_set_config,
> > > > > > > >          .vhost_crypto_create_session = 
> > > > > > > > vhost_user_crypto_create_session,
> > > > > > > >          .vhost_crypto_close_session = 
> > > > > > > > vhost_user_crypto_close_session,
> > > > > > > > +        .vhost_backend_need_all_device_iotlb = 
> > > > > > > > vhost_user_need_all_device_iotlb,
> > > > > > > >  };
> > > > > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > > > > > > index f51bf573d5..70922ee998 100644
> > > > > > > > --- a/hw/virtio/vhost.c
> > > > > > > > +++ b/hw/virtio/vhost.c
> > > > > > > > @@ -48,6 +48,9 @@ static unsigned int used_memslots;
> > > > > > > >  static QLIST_HEAD(, vhost_dev) vhost_devices =
> > > > > > > >      QLIST_HEAD_INITIALIZER(vhost_devices);
> > > > > > > >  
> > > > > > > > +static int vhost_memory_region_lookup(struct vhost_dev *hdev, 
> > > > > > > > uint64_t gpa,
> > > > > > > > +                                      uint64_t *uaddr, 
> > > > > > > > uint64_t *len);
> > > > > > > > +
> > > > > > > >  bool vhost_has_free_slot(void)
> > > > > > > >  {
> > > > > > > >      unsigned int slots_limit = ~0U;
> > > > > > > > @@ -634,12 +637,39 @@ static void 
> > > > > > > > vhost_region_addnop(MemoryListener *listener,
> > > > > > > >      vhost_region_add_section(dev, section);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, 
> > > > > > > > IOMMUTLBEntry *iotlb)
> > > > > > > > +static void vhost_iommu_map_notify(IOMMUNotifier *n, 
> > > > > > > > IOMMUTLBEntry *iotlb)
> > > > > > > >  {
> > > > > > > >      struct vhost_iommu *iommu = container_of(n, struct 
> > > > > > > > vhost_iommu, n);
> > > > > > > >      struct vhost_dev *hdev = iommu->hdev;
> > > > > > > >      hwaddr iova = iotlb->iova + iommu->iommu_offset;
> > > > > > > >  
> > > > > > > > +    if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > > > > > > > +        uint64_t uaddr, len;
> > > > > > > > +
> > > > > > > > +        rcu_read_lock();
> > > > > > > > +
> > > > > > > > +        if (iotlb->target_as != NULL) {
> > > > > > > > +            if (vhost_memory_region_lookup(hdev, 
> > > > > > > > iotlb->translated_addr,
> > > > > > > > +                        &uaddr, &len)) {
> > > > > > > > +                error_report("Fail to lookup the translated 
> > > > > > > > address "
> > > > > > > > +                        "%"PRIx64, iotlb->translated_addr);
> > > > > > > > +                goto out;
> > > > > > > > +            }
> > > > > > > > +
> > > > > > > > +            len = MIN(iotlb->addr_mask + 1, len);
> > > > > > > > +            iova = iova & ~iotlb->addr_mask;
> > > > > > > > +
> > > > > > > > +            if (vhost_backend_update_device_iotlb(hdev, iova, 
> > > > > > > > uaddr,
> > > > > > > > +                                                  len, 
> > > > > > > > iotlb->perm)) {
> > > > > > > > +                error_report("Fail to update device iotlb");
> > > > > > > > +                goto out;
> > > > > > > > +            }
> > > > > > > > +        }
> > > > > > > > +out:
> > > > > > > > +        rcu_read_unlock();
> > > > > > > > +        return;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > >      if (vhost_backend_invalidate_device_iotlb(hdev, iova,
> > > > > > > >                                                iotlb->addr_mask 
> > > > > > > > + 1)) {
> > > > > > > >          error_report("Fail to invalidate device iotlb");
> > > > > > > > @@ -652,6 +682,7 @@ static void 
> > > > > > > > vhost_iommu_region_add(MemoryListener *listener,
> > > > > > > >      struct vhost_dev *dev = container_of(listener, struct 
> > > > > > > > vhost_dev,
> > > > > > > >                                           iommu_listener);
> > > > > > > >      struct vhost_iommu *iommu;
> > > > > > > > +    IOMMUNotifierFlag flags;
> > > > > > > >      Int128 end;
> > > > > > > >  
> > > > > > > >      if (!memory_region_is_iommu(section->mr)) {
> > > > > > > > @@ -662,8 +693,15 @@ static void 
> > > > > > > > vhost_iommu_region_add(MemoryListener *listener,
> > > > > > > >      end = 
> > > > > > > > int128_add(int128_make64(section->offset_within_region),
> > > > > > > >                       section->size);
> > > > > > > >      end = int128_sub(end, int128_one());
> > > > > > > > -    iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify,
> > > > > > > > -                        IOMMU_NOTIFIER_UNMAP,
> > > > > > > > +
> > > > > > > > +    if 
> > > > > > > > (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) {
> > > > > > > > +        flags = IOMMU_NOTIFIER_ALL;
> > > > > > > > +    } else {
> > > > > > > > +        flags = IOMMU_NOTIFIER_UNMAP;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > > +    iommu_notifier_init(&iommu->n, vhost_iommu_map_notify,
> > > > > > > > +                        flags,
> > > > > > > >                          section->offset_within_region,
> > > > > > > >                          int128_get64(end));
> > > > > > > >      iommu->mr = section->mr;
> > > > > > > > @@ -673,6 +711,9 @@ static void 
> > > > > > > > vhost_iommu_region_add(MemoryListener *listener,
> > > > > > > >      memory_region_register_iommu_notifier(section->mr, 
> > > > > > > > &iommu->n);
> > > > > > > >      QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next);
> > > > > > > >      /* TODO: can replay help performance here? */
> > > > > > > > +    if (flags == IOMMU_NOTIFIER_ALL) {
> > > > > > > > +        
> > > > > > > > memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), 
> > > > > > > > &iommu->n);
> > > > > > > > +    }
> > > > > > > >  }
> > > > > > > >  
> > > > > > > >  static void vhost_iommu_region_del(MemoryListener *listener,
> > > > > > > > diff --git a/include/hw/virtio/vhost-backend.h 
> > > > > > > > b/include/hw/virtio/vhost-backend.h
> > > > > > > > index 5dac61f9ea..602fd987a4 100644
> > > > > > > > --- a/include/hw/virtio/vhost-backend.h
> > > > > > > > +++ b/include/hw/virtio/vhost-backend.h
> > > > > > > > @@ -101,6 +101,8 @@ typedef int 
> > > > > > > > (*vhost_crypto_create_session_op)(struct vhost_dev *dev,
> > > > > > > >  typedef int (*vhost_crypto_close_session_op)(struct vhost_dev 
> > > > > > > > *dev,
> > > > > > > >                                               uint64_t 
> > > > > > > > session_id);
> > > > > > > >  
> > > > > > > > +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct 
> > > > > > > > vhost_dev *dev);
> > > > > > > > +
> > > > > > > >  typedef struct VhostOps {
> > > > > > > >      VhostBackendType backend_type;
> > > > > > > >      vhost_backend_init vhost_backend_init;
> > > > > > > > @@ -138,6 +140,7 @@ typedef struct VhostOps {
> > > > > > > >      vhost_set_config_op vhost_set_config;
> > > > > > > >      vhost_crypto_create_session_op vhost_crypto_create_session;
> > > > > > > >      vhost_crypto_close_session_op vhost_crypto_close_session;
> > > > > > > > +    vhost_backend_need_all_device_iotlb_op 
> > > > > > > > vhost_backend_need_all_device_iotlb;
> > > > > > > >  } VhostOps;
> > > > > > > >  
> > > > > > > >  extern const VhostOps user_ops;
> > > > > > > > -- 
> > > > > > > > 2.11.0



reply via email to

[Prev in Thread] Current Thread [Next in Thread]