[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 08/13] iommu: Introduce IOMMU emulation infrastr

From: Benjamin Herrenschmidt
Subject: Re: [Qemu-devel] [PATCH 08/13] iommu: Introduce IOMMU emulation infrastructure
Date: Wed, 16 May 2012 07:55:43 +1000

On Tue, 2012-05-15 at 09:02 -0500, Anthony Liguori wrote:

> I think the crux of your argument is that upon a change to the translation 
> table, the operation acts as a barrier such that the exact moment it returns, 
> you're guaranteed that no DMAs are in flight with the old translation mapping.

Not when the translation is changed in memory but whenever the
translation cache are invalidated or whatever other mechanism the HW
provides to do that synchronization. On PAPR, this guarantee is provided
by the H_PUT_TCE hypervisor call which we use to manipulate

[ Note that for performance reasons, it might end up being very
impractical to provide that guarantee since it prevents us from handling
H_PUT_TCE entirely in kernel real mode like we to today... we'll have to
figure our what we want to do here for the TCE backend implementation,
maybe have qemu mark "in use" translations and cause exist when those
are modified ... ]

> That's not my understanding of at least VT-d and I have a hard time believing 
> it's true for other IOMMUs as that kind of synchronization seems like it 
> would 
> be very expensive to implement in hardware.

How so ? It's perfectly standard stuff ... it's usually part of the TLB
flushing op.

> Rather, when the IOTLB is flushed, I believe the only guarantee that you have 
> is 
> that future IOTLB lookups will return the new mapping.  But that doesn't mean 
> that there isn't a request in flight that uses the old mapping.

I would be very surprised if that was the case :-)

I don't think any sane HW implementation would fail to provide full
synchronization with invalidations. That's how MMUs operate and I don't
see any reason why an iommu shouldn't be held to the same standards.

If it didn't, you'd have a nice host attack... have a guest doing
pass-through start a very long transaction and immediately commit
suicide. KVM starts reclaiming the pages, they go back to the host,
might be re-used immediately ... while still being DMAed to.

> I will grant you that PCI transactions are typically much smaller than QEMU 
> transactions such that we may continue to use the old mappings for much 
> longer 
> than real hardware would.  But I think that still puts us well within the 
> realm 
> of correctness.

No, a "random amount of time after invalidation" is not and will never
be correct. On large SMP machines, the time between a page being freed
and that page being re-used can be very small. The memory being re-used
by something like kexec can happen almost immediately while qemu is
blocked on an AIO that takes milliseconds ... etc....

At least because this is emulated iommu, qemu only writes to virtual
addresses mapping the guest space, so this isn't a host attack (unlike
with a real HW iommu however where the lack of such synchronization
would definitely be, as I described earlier).

> > Cases where that matter are unloading of a (broken) driver, kexec/kdump
> > from one guest to another etc... all involve potentially clearing all
> > iommu tables while a driver might have left a device DMA'ing. The
> > expectation is that the device will get target aborts from the iommu
> > until the situation gets "cleaned up" in SW.
> Yes, this would be worse in QEMU than on bare metal because we essentially 
> have 
> a much larger translation TLB.  But as I said above, I think we're well 
> within 
> the specified behavior here.

No :-)

> >> Why does this need to be guaranteed?  How can software depend on this in a
> >> meaningful way?
> >
> > The same as TLB invalidations :-)
> >
> > In real HW, this is a property of the HW itself, ie, whatever MMIO is
> > used to invalidate the HW TLB provides a way to ensure (usually by
> > reading back) that any request pending in the iommu pipeline has either
> > been completed or canned.
> Can you point to a spec that says this?  This doesn't match my understanding.

Appart from common sense ? I'd have to dig to get you actual specs but
it should be plain obvious that you need that sort of sync or you simply
cannot trust your iommu to do virtualization.

> > When we start having page fault capable iommu's this will be even more
> > important as faults will be be part of the non-error case.
> We can revisit this discussion after every PCI device is changed to cope with 
> a 
> page fault capable IOMMU ;-)

Heh, well, the point is that is still part of the base iommu model, page
faulting is just going to make the problem worse.

> >>> David's approach may not be the best long term, but provided it's not
> >>> totally broken (I don't know qemu locking well enough to judge how
> >>> dangerous it is) then it might be a "good enough" first step until we
> >>> come up with something better ?
> >>
> >> No, it's definitely not good enough.  Dropping the global mutex in random 
> >> places
> >> is asking for worlds of hurt.
> >>
> >> If this is really important, then we need some sort of cancellation API to 
> >> go
> >> along with map/unmap although I doubt that's really possible.
> >>
> >> MMIO/PIO operations cannot block.
> >
> > Well, there's a truckload of cases in real HW where an MMIO/PIO read is
> > used to synchronize some sort of HW operation.... I suppose nothing that
> > involves blocking at this stage in qemu but I would be careful with your
> > expectations here... writes are usually pipelined but blocking on a read
> > response does make a lot of sense.
> Blocking on an MMIO/PIO request effectively freezes a CPU.  All sorts of 
> badness 
> results from that.  Best case scenario, you trigger soft lockup warnings.

Well, that's exactly what happens in HW on PIO accesses and MMIO reads
waiting for a reply...

> > In any case, for the problem at hand, I can just drop the wait for now
> > and maybe just print a warning if I see an existing map.
> >
> > We still need some kind of either locking or barrier to simply ensure
> > that the updates to the TCE table are visible to other processors but
> > that can be done in the backend.
> >
> > But I wouldn't just forget about the issue, it's going to come back and
> > bite...
> I think working out the exact semantics of what we need to do is absolutely 
> important.  But I think you're taking an overly conservative approach to what 
> we 
> need to provide here.

I'm happy to have the patches merged without that for now, it will get
us going with USB emulation etc... which we need for graphics, but we do
need to sort this out eventually.

I'll re-submit without it.


> Regards,
> Anthony Liguori
> >
> > Cheers,
> > Ben.
> >
> >> Regards,
> >>
> >> Anthony Liguori
> >>
> >>>
> >>> The normal case will be that no map exist, ie, it will almost always be
> >>> a guest programming error to remove an iommu mapping while a device is
> >>> actively using it, so having this case be slow is probably a non-issue.
> >>>
> >>> Cheers,
> >>> Ben.
> >>>
> >>>
> >
> >

reply via email to

[Prev in Thread] Current Thread [Next in Thread]