qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] vhost, iova, and dirty page tracking


From: Tian, Kevin
Subject: Re: [Qemu-devel] vhost, iova, and dirty page tracking
Date: Wed, 18 Sep 2019 02:15:41 +0000

> From: Tian, Kevin
> Sent: Wednesday, September 18, 2019 9:32 AM
> 
> > From: Alex Williamson [mailto:address@hidden]
> > Sent: Tuesday, September 17, 2019 10:54 PM
> >
> > On Tue, 17 Sep 2019 08:48:36 +0000
> > "Tian, Kevin" <address@hidden> wrote:
> >
> > > > From: Jason Wang [mailto:address@hidden]
> > > > Sent: Monday, September 16, 2019 4:33 PM
> > > >
> > > >
> > > > On 2019/9/16 上午9:51, Tian, Kevin wrote:
> > > > > Hi, Jason
> > > > >
> > > > > We had a discussion about dirty page tracking in VFIO, when
> vIOMMU
> > > > > is enabled:
> > > > >
> > > > > https://lists.nongnu.org/archive/html/qemu-devel/2019-
> > > > 09/msg02690.html
> > > > >
> > > > > It's actually a similar model as vhost - Qemu cannot interpose the
> fast-
> > > > path
> > > > > DMAs thus relies on the kernel part to track and report dirty page
> > > > information.
> > > > > Currently Qemu tracks dirty pages in GFN level, thus demanding a
> > > > translation
> > > > > from IOVA to GPA. Then the open in our discussion is where this
> > > > translation
> > > > > should happen. Doing the translation in kernel implies a device iotlb
> > > > flavor,
> > > > > which is what vhost implements today. It requires potentially large
> > > > tracking
> > > > > structures in the host kernel, but leveraging the existing log_sync
> flow
> > in
> > > > Qemu.
> > > > > On the other hand, Qemu may perform log_sync for every removal
> of
> > > > IOVA
> > > > > mapping and then do the translation itself, then avoiding the GPA
> > > > awareness
> > > > > in the kernel side. It needs some change to current Qemu log-sync
> > flow,
> > > > and
> > > > > may bring more overhead if IOVA is frequently unmapped.
> > > > >
> > > > > So we'd like to hear about your opinions, especially about how you
> > came
> > > > > down to the current iotlb approach for vhost.
> > > >
> > > >
> > > > We don't consider too much in the point when introducing vhost. And
> > > > before IOTLB, vhost has already know GPA through its mem table
> > > > (GPA->HVA). So it's nature and easier to track dirty pages at GPA level
> > > > then it won't any changes in the existing ABI.
> > >
> > > This is the same situation as VFIO.
> >
> > It is?  VFIO doesn't know GPAs, it only knows HVA, HPA, and IOVA.  In
> > some cases IOVA is GPA, but not all.
> 
> Well, I thought vhost has a similar design, that the index of its mem table
> is GPA when vIOMMU is off and then becomes IOVA when vIOMMU is on.
> But I may be wrong here. Jason, can you help clarify? I saw two
> interfaces which poke the mem table: VHOST_SET_MEM_TABLE (for GPA)
> and VHOST_IOTLB_UPDATE (for IOVA). Are they used exclusively or
> together?
> 
> >
> > > > For VFIO case, the only advantages of using GPA is that the log can
> then
> > > > be shared among all the devices that belongs to the VM. Otherwise
> > > > syncing through IOVA is cleaner.
> > >
> > > I still worry about the potential performance impact with this approach.
> > > In current mdev live migration series, there are multiple system calls
> > > involved when retrieving the dirty bitmap information for a given
> memory
> > > range. IOVA mappings might be changed frequently. Though one may
> > > argue that frequent IOVA change already has bad performance, it's still
> > > not good to introduce further non-negligible overhead in such situation.
> > >
> > > On the other hand, I realized that adding IOVA awareness in VFIO is
> > > actually easy. Today VFIO already maintains a full list of IOVA and its
> > > associated HVA in vfio_dma structure, according to VFIO_MAP and
> > > VFIO_UNMAP. As long as we allow the latter two operations to accept
> > > another parameter (GPA), IOVA->GPA mapping can be naturally cached
> > > in existing vfio_dma objects. Those objects are always updated
> according
> > > to MAP and UNMAP ioctls to be up-to-date. Qemu then uniformly
> > > retrieves the VFIO dirty bitmap for the entire GPA range in every pre-
> copy
> > > round, regardless of whether vIOMMU is enabled. There is no need of
> > > another IOTLB implementation, with the main ask on a v2 MAP/UNMAP
> > > interface.
> > >
> > > Alex, your thoughts?
> >
> > Same as last time, you're asking VFIO to be aware of an entirely new
> > address space and implement tracking structures of that address space
> > to make life easier for QEMU.  Don't we typically push such complexity
> > to userspace rather than into the kernel?  I'm not convinced.  Thanks,
> >
> 
> Is it really complex? No need of a new tracking structure. Just allowing
> the MAP interface to carry a new parameter and then record it in the
> existing vfio_dma objects.
> 
> Note the frequency of guest DMA map/unmap could be very high. We
> saw >100K invocations per second with a 40G NIC. To do the right
> translation Qemu requires log_sync for every unmap, before the
> mapping for logged dirty IOVA becomes stale. In current Kirti's patch,
> each log_sync requires several system_calls through the migration
> info, e.g. setting start_pfn/page_size/total_pfns and then reading
> data_offset/data_size. That design is fine for doing log_sync in every
> pre-copy round, but too costly if doing so for every IOVA unmap. If
> small extension in kernel can lead to great overhead reduction,
> why not?
> 

There is another value of recording GPA in VFIO. Vendor drivers (e.g.
GVT-g) may need to selectively write-protect guest memory pages
when interpreting certain workload descriptors. Those pages are
recorded in IOVA when vIOMMU is enabled, however the KVM 
write-protection API only knows GPA. So currently vIOMMU must
be disabled on Intel vGPUs when GVT-g is enabled. To make it working
we need a way to translate IOVA into GPA in the vendor drivers. There
are two options. One is having KVM export a new API for such 
translation purpose. But as you explained earlier it's not good to
have vendor drivers depend on KVM. The other is having VFIO
maintaining such knowledge through extended MAP interface, 
then providing a uniform API for all vendor drivers to use.

Thanks
Kevin
 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]