qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH V2 07/11] virtio-pci: address space translation


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [PATCH V2 07/11] virtio-pci: address space translation service (ATS) support
Date: Fri, 11 Nov 2016 05:49:14 +0200

On Fri, Nov 11, 2016 at 11:26:12AM +0800, Jason Wang wrote:
> 
> 
> On 2016年11月11日 01:32, Michael S. Tsirkin wrote:
> > On Fri, Nov 04, 2016 at 02:48:20PM +0800, Jason Wang wrote:
> > > 
> > > On 2016年11月04日 03:49, Michael S. Tsirkin wrote:
> > > > On Thu, Nov 03, 2016 at 05:27:19PM +0800, Jason Wang wrote:
> > > > > > This patches enable the Address Translation Service support for 
> > > > > > virtio
> > > > > > pci devices. This is needed for a guest visible Device IOTLB
> > > > > > implementation and will be required by vhost device IOTLB API
> > > > > > implementation for intel IOMMU.
> > > > > > 
> > > > > > Cc: Michael S. Tsirkin<address@hidden>
> > > > > > Signed-off-by: Jason Wang<address@hidden>
> > > > I'd like to understand why do you think this is strictly required.
> > > > Won't setting CM bit in the IOMMU do the trick.
> > > ATS was chosen for performance. Since there're many problems for CM:
> > > 
> > > - CM was slow (10%-20% slower on real hardware for things like netperf)
> > > because of each transition between non-present and present mapping needs 
> > > an
> > > explicit invalidation. It may slow down the whole VM.
> > > - Without ATS/Device IOTLB, IOMMU becomes a bottleneck because of 
> > > contending
> > > of IOTLB entries. (What we can do in this case is in fact userspace IOTLB
> > > snooping, this could be done even without CM).
> > > It was natural to think of ATS when designing interface between IOMMU and
> > > device/remote IOTLBs. Do you see any drawbacks on ATS here?
> > > 
> > > Thanks
> > In fact at this point I'm confused. Any mapping needs to be programmed
> > in the IOMMU. We need to implement this correctly.
> > Once we do why do we need ATS?
> > I think what you need is map/unmap notifiers that Aviv is working on.
> > No?
> 
> Let me clarify, device IOTLB API can work without ATS or CM. So there're
> three ways to do:
> 
> 1) without ATS or CM support, the function could be implemented through:
> 1.1: asking for qemu help if there's an IOTLB miss in vhost
> 1.2: snooping the userspace IOTLB invalidation (present to non-present
> mapping) and update device IOTLB
> 
> 2) with CM enabled, the only thing we can add is snooping the non-present to
> present mapping and update the device IOTLB. This is not a requirement since
> we still can get this through asking qemu's(1.2) help.
> 
> 3) with ATS enabled, guest knows the existence of device IOTLB, and device
> IOTLB entires needs to be flushed explicitly by guest. In this case there's
> no need to snoop the ordinary IOTLB invalidation in 1.2. We just need to
> snoop the device IOTLB specific invalidation request from guest.
> 
> All the above 3 methods work very well, but let's have a look at performance
> impact:
> 
> - Method 1 (without CM or ATS), the performance is not the best since guest
> does not know about the existence of remote IOTLB, this means the flush of
> device IOTLB entry could not be done on demand. One example is some IOMMU
> driver (e.g intel) tends to optimize the IOTLB invalidations by issuing a
> global invalidation periodically. We need to flush the device IOTLB too in
> this case. Thus we can notice some jitter (because of IOTLB miss).
> 
> - Method 2 (with CM but without ATS) seems to be the worst case. It has not
> only all problems above a but also a new one: each transition needs to
> notify the device explicitly. Even if dpdk use static mappings, all other
> devices in the VM use dynamic ones which slows down the whole the system.
> According to the test, CM is about 10%-20% slower in real hardware.
> 
> - Method 3 (ATS) can give the best performance, all the problems have gone
> since guest can flush the device IOTLB entry on demand. It was defined by
> spec and was designed to solve the issues just like what we meet here, and
> was supported by modern IOMMUs.
> 
> And what's even better, implementing ATS turns out less than 100 lines of
> codes. And it was much more easier to  be enabled on other IOMMU (AMD IOMMU
> only needs 20 lines of codes). All other ways (I started and have codes for
> method 1 for intel IOMMU) need lots of work specific to each kind of IOMMU.

method 1 is basically what Aviv implemented except you don't
need map notifiers, only unmap.

> 
> Consider so much advantages by just adding so small lines of codes. I don't
> see why we don't need ATS (for the IOOMUs that supports it).
> 
> Thanks

I am concerned that not all IOMMUs and guests support ATS.

> > 
> > 
> > > > Also, could you remind me pls - can guests just disable ATS?
> > > > 
> > > > What happens then?
> > > > 
> > > > 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]