qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vhost: Ignore vrings in dirty log when using a vIOMMU


From: Michael S. Tsirkin
Subject: Re: [PATCH] vhost: Ignore vrings in dirty log when using a vIOMMU
Date: Tue, 6 Oct 2020 06:49:28 -0400

On Tue, Oct 06, 2020 at 11:58:50AM +0200, Greg Kurz wrote:
> On Mon, 5 Oct 2020 10:18:03 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Sep 28, 2020 at 09:37:18AM +0200, Greg Kurz wrote:
> > > On Mon, 28 Sep 2020 16:23:43 +1000
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > 
> > > > On Fri, Sep 25, 2020 at 07:29:43PM +0200, Greg Kurz wrote:
> > > > > When a vIOMMU is present, any address comming from the guest is an IO
> > > > > virtual address, including those of the vrings. The backend's accesses
> > > > > to the vrings happen through vIOMMU translation : the backend hence
> > > > > only logs the final guest physical address, not the IO virtual one.
> > > > > It thus doesn't make sense to make room for the vring addresses in the
> > > > > dirty log in this case.
> > > > > 
> > > > > This fixes a crash of the source when migrating a guest using 
> > > > > in-kernel
> > > > > vhost-net and iommu_platform=on on POWER, because DMA regions are put
> > > > > at very high addresses and the resulting log size is likely to cause
> > > > > g_malloc0() to abort.
> > > > > 
> > > > > BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1879349
> > > > > Signed-off-by: Greg Kurz <groug@kaod.org>
> > > > 
> > > > I'm a little confused as to what's going on here.  Obviously
> > > > allocating dirty bitmaps in IOVA space doesn't make much sense.
> > > > But.. in all cases isn't the ring ending up in guest memory, whether
> > > > translated or not.  So why do specific addresses of the ring make a
> > > > difference in *any* case.
> > > > 
> > > 
> > > I admit I'm a bit surprised as well... I can't think of a scenario
> > > where the address of the used ring would be higher than the guest
> > > memory... Maybe MST can shed some light here ?
> > 
> > So the original idea was that vring itself is specified in
> > terms of HVA as opposed to rest of stuff which is specified
> > in terms of GPA.
> 
> The vring itself is indeed described in term of HVA (vq->used) but
> when it comes to the dirty log, QEMU passes the GPA of the used
> structure to the vhost backend:
> 
> >From vhost_virtqueue_set_addr():
> 
>     addr.log_guest_addr = vq->used_phys;
>                               ^^ GPA ^^
>     addr.flags = enable_log ? (1 << VHOST_VRING_F_LOG) : 0;
>     r = dev->vhost_ops->vhost_set_vring_addr(dev, &addr);

Right. The point being we either use the log bitmap or the ring aliasing
trick to track memory changes, not both. If we used the ring aliasing
trick then presumably VHOST_VRING_F_LOG would be clear and then
log_guest_addr is unused.

> And the sizing of the bitmap computed in vhost_get_log_size() is
> also based on this GPA:
> 
>     for (i = 0; i < dev->nvqs; ++i) {
>         struct vhost_virtqueue *vq = dev->vqs + i;
> 
>         if (!vq->used_phys && !vq->used_size) {
>             continue;
>         }
> 
>         uint64_t last = vq->used_phys + vq->used_size - 1;
>                             ^^ GPA ^^
>         log_size = MAX(log_size, last / VHOST_LOG_CHUNK + 1);
>     }
>
> With platform_iommu=off, I couldn't find a case where this second
> loop in vhost_get_log_size() increases the log size actually... and
> when platform_iommu=on, vq->used_phys is a GIOVA that my other patches
> you've already merged in the kernel explicitly ignore when it comes to
> the dirty log... So it really seems that this loop is useless for the
> general case. If it is there to address some corner case, I guess it
> deserves a comment at the very least.
> 
> Cheers,
> 
> --
> Greg
> 
> > This way we wanted to support e.g. migration by vhost writing into
> > qemu address space, qemu copying data to guest memory.
> > 
> > 
> > 
> > 
> > > > > ---
> > > > >  hw/virtio/vhost.c |   38 ++++++++++++++++++++++++--------------
> > > > >  1 file changed, 24 insertions(+), 14 deletions(-)
> > > > > 
> > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > > > index 1a1384e7a642..0b83d6b8e65e 100644
> > > > > --- a/hw/virtio/vhost.c
> > > > > +++ b/hw/virtio/vhost.c
> > > > > @@ -106,6 +106,20 @@ static void vhost_dev_sync_region(struct 
> > > > > vhost_dev *dev,
> > > > >      }
> > > > >  }
> > > > >  
> > > > > +static int vhost_dev_has_iommu(struct vhost_dev *dev)
> > > > > +{
> > > > > +    VirtIODevice *vdev = dev->vdev;
> > > > > +
> > > > > +    /*
> > > > > +     * For vhost, VIRTIO_F_IOMMU_PLATFORM means the backend support
> > > > > +     * incremental memory mapping API via IOTLB API. For platform 
> > > > > that
> > > > > +     * does not have IOMMU, there's no need to enable this feature
> > > > > +     * which may cause unnecessary IOTLB miss/update trnasactions.
> > > > > +     */
> > > > > +    return vdev->dma_as != &address_space_memory &&
> > > > > +           virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
> > > > > +}
> > > > > +
> > > > >  static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
> > > > >                                     MemoryRegionSection *section,
> > > > >                                     hwaddr first,
> > > > > @@ -130,6 +144,11 @@ static int vhost_sync_dirty_bitmap(struct 
> > > > > vhost_dev *dev,
> > > > >                                range_get_last(reg->guest_phys_addr,
> > > > >                                               reg->memory_size));
> > > > >      }
> > > > > +
> > > > > +    if (vhost_dev_has_iommu(dev)) {
> > > > > +        return 0;
> > > > > +    }
> > > > > +
> > > > >      for (i = 0; i < dev->nvqs; ++i) {
> > > > >          struct vhost_virtqueue *vq = dev->vqs + i;
> > > > >  
> > > > > @@ -172,6 +191,11 @@ static uint64_t vhost_get_log_size(struct 
> > > > > vhost_dev *dev)
> > > > >                                         reg->memory_size);
> > > > >          log_size = MAX(log_size, last / VHOST_LOG_CHUNK + 1);
> > > > >      }
> > > > > +
> > > > > +    if (vhost_dev_has_iommu(dev)) {
> > > > > +        return log_size;
> > > > > +    }
> > > > > +
> > > > >      for (i = 0; i < dev->nvqs; ++i) {
> > > > >          struct vhost_virtqueue *vq = dev->vqs + i;
> > > > >  
> > > > > @@ -287,20 +311,6 @@ static inline void vhost_dev_log_resize(struct 
> > > > > vhost_dev *dev, uint64_t size)
> > > > >      dev->log_size = size;
> > > > >  }
> > > > >  
> > > > > -static int vhost_dev_has_iommu(struct vhost_dev *dev)
> > > > > -{
> > > > > -    VirtIODevice *vdev = dev->vdev;
> > > > > -
> > > > > -    /*
> > > > > -     * For vhost, VIRTIO_F_IOMMU_PLATFORM means the backend support
> > > > > -     * incremental memory mapping API via IOTLB API. For platform 
> > > > > that
> > > > > -     * does not have IOMMU, there's no need to enable this feature
> > > > > -     * which may cause unnecessary IOTLB miss/update trnasactions.
> > > > > -     */
> > > > > -    return vdev->dma_as != &address_space_memory &&
> > > > > -           virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
> > > > > -}
> > > > > -
> > > > >  static void *vhost_memory_map(struct vhost_dev *dev, hwaddr addr,
> > > > >                                hwaddr *plen, bool is_write)
> > > > >  {
> > > > > 
> > > > > 
> > > > 
> > > 
> > 
> > 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]