qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] intel_iommu: allow dynamic switch of IOMMU regi


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH] intel_iommu: allow dynamic switch of IOMMU region
Date: Tue, 20 Dec 2016 14:38:01 +0800
User-agent: Mutt/1.5.24 (2015-08-30)

On Mon, Dec 19, 2016 at 09:52:52PM -0700, Alex Williamson wrote:

[...]

> > Yes, this patch just tried to move VT-d forward a bit, rather than do
> > it once and for all. I think we can do better than this in the future,
> > for example, one address space per guest IOMMU domain (as you have
> > mentioned before). However I suppose that will need more work (which I
> > still can't estimate on the amount of work). So I am considering to
> > enable the device assignments functionally first, then we can further
> > improve based on a workable version. Same thoughts apply to the IOMMU
> > replay RFC series.
> 
> I'm not arguing against it, I'm just trying to set expectations for
> where this gets us.  An AddressSpace per guest iommu domain seems like
> the right model for QEMU, but it has some fundamental issues with
> vfio.  We currently tie a QEMU AddressSpace to a vfio container, which
> represents the host IOMMU context.  The AddressSpace of a device is
> currently assumed to be fixed in QEMU, guest IOMMU domains clearly
> are not.  vfio only let's us have access to a device while it's
> protected within a container.  Therefore in order to move a device to a
> different AddressSpace based on the guest domain configuration, we'd
> need to tear down the vfio configuration, including releasing the
> device.

I assume this is VT-d specific issue, right? Looks like ppc is using a
totally differnet way to manage the mapping, and devices can share the
same address space.

>  
> > Regarding to the locked memory accounting issue: do we have existing
> > way to do the accounting? If so, would you (or anyone) please
> > elaborate a bit? If not, is that an ongoing/planned work?
> 
> As I describe above, there's a vfio container per AddressSpace, each
> container is an IOMMU domain in the host.  In the guest, an IOMMU
> domain can include multiple AddressSpaces, one for each context entry
> that's part of the domain.  When the guest programs a translation for
> an IOMMU domain, that maps a guest IOVA to a guest physical address,
> for each AddressSpace.  Each AddressSpace is backed by a vfio
> container, which needs to pin the pages of that translation in order to
> get a host physical address, which then gets programmed into the host
> IOMMU domain with the guest-IOVA and host physical address.  The
> pinning process is where page accounting is done.  It's done per vfio
> context.  The worst case scenario for accounting is thus when VT-d is
> present but disabled (or in passthrough mode) as each AddressSpace
> duplicates address_space_memory and every page of guest memory is
> pinned and accounted for each vfio container.

IIUC this accounting issue will solve itself if we can solve the
previous issue. While we don't have it now, so ...

> 
> That's the existing way we do accounting.  There is no current
> development that I'm aware of to change this.  As above, the simplest
> stop-gap solution is that libvirt would need to be aware when VT-d is
> present for a VM and use a different algorithm to set QEMU locked
> memory limit, but it's not without its downsides.

... here I think it's sensible to consider a specific algorithm for
vt-d use case. I am just curious about how should we define this
algorithm.

First of all, when the devices are not sharing domain (or say, one
guest iommu domain per assigned device), everything should be fine. No
special algorithm needed. IMHO the problem will happen only if there
are assigned devices that share a same address space (either system,
or specific iommu domain). In that case, the accounted value (or say,
current->mm->locked_vm iiuc) will be bigger than the real locked
memory size.

However, I think the problem is whether devices will be put into same
address space depends on guest behavior - the guest can either use
iommu=pt, or manually putting devices into the same guest iommu region
to achieve that. But from hypervisor POV, how should we estimate this?
Can we really?

> Alternatively, a new
> IOMMU model would need to be developed for vfio.  The type1 model was
> only ever intended to be used for relatively static user mappings and I
> expect it to have horrendous performance when backing a dynamic guest
> IOMMU domain.  Really the only guest IOMMU usage model that makes any
> sort of sense with type1 is to run the guest with passthrough (iommu=pt)
> and only pull devices out of passthrough for relatively static mapping
> cases within the guest userspace (nested assigned devices or dpdk).  If
> the expectation is that we just need this one little bit more code to
> make vfio usable in the guest, that may be true, but it really is just
> barely usable.  It's not going to be fast for any sort of dynamic
> mapping and it's going to have accounting issues that are not
> compatible with how libvirt sets locked memory limits for QEMU as soon
> as you go beyond a single device.  Thanks,

I can totally understand that the performance will suck if dynamic
mapping is used. AFAIU this work will only be used with static dma
mapping like running DPDK in guest (besides other trivial goals, like,
development purpose).

Regarding to "the other" iommu model you mentioned besides type1, is
there any existing discussions out there? Any further learning
material/links would be greatly welcomed.

Thanks!

-- peterx



reply via email to

[Prev in Thread] Current Thread [Next in Thread]