[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 08/13] iommu: Introduce IOMMU emulation infrastr

From: Anthony Liguori
Subject: Re: [Qemu-devel] [PATCH 08/13] iommu: Introduce IOMMU emulation infrastructure
Date: Tue, 15 May 2012 09:02:58 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1

On 05/14/2012 10:02 PM, Benjamin Herrenschmidt wrote:
On Mon, 2012-05-14 at 21:50 -0500, Anthony Liguori wrote:
On 05/14/2012 09:32 PM, Benjamin Herrenschmidt wrote:
On Mon, 2012-05-14 at 21:03 -0500, Anthony Liguori wrote:
So the CPU thread runs in lock-step with the I/O thread.  Dropping the CPU
thread lock to let the I/O thread run is a dangerous thing to do in a place like

Also, I think you'd effectively block the CPU until pending DMA operations
complete?  This could be many, many, milliseconds, no?  That's going to make
guests very upset.

Do you see any other option ?

Yes, ignore it.

I have a hard time believing software depends on changing DMA translation
mid-way through a transaction.

It's a correctness issue. It won't happen in normal circumstances but it
can, and thus should be handled gracefully.

I think the crux of your argument is that upon a change to the translation table, the operation acts as a barrier such that the exact moment it returns, you're guaranteed that no DMAs are in flight with the old translation mapping.

That's not my understanding of at least VT-d and I have a hard time believing it's true for other IOMMUs as that kind of synchronization seems like it would be very expensive to implement in hardware.

Rather, when the IOTLB is flushed, I believe the only guarantee that you have is that future IOTLB lookups will return the new mapping. But that doesn't mean that there isn't a request in flight that uses the old mapping.

I will grant you that PCI transactions are typically much smaller than QEMU transactions such that we may continue to use the old mappings for much longer than real hardware would. But I think that still puts us well within the realm of correctness.

Cases where that matter are unloading of a (broken) driver, kexec/kdump
from one guest to another etc... all involve potentially clearing all
iommu tables while a driver might have left a device DMA'ing. The
expectation is that the device will get target aborts from the iommu
until the situation gets "cleaned up" in SW.

Yes, this would be worse in QEMU than on bare metal because we essentially have a much larger translation TLB. But as I said above, I think we're well within the specified behavior here.

Why does this need to be guaranteed?  How can software depend on this in a
meaningful way?

The same as TLB invalidations :-)

In real HW, this is a property of the HW itself, ie, whatever MMIO is
used to invalidate the HW TLB provides a way to ensure (usually by
reading back) that any request pending in the iommu pipeline has either
been completed or canned.

Can you point to a spec that says this?  This doesn't match my understanding.

When we start having page fault capable iommu's this will be even more
important as faults will be be part of the non-error case.

We can revisit this discussion after every PCI device is changed to cope with a page fault capable IOMMU ;-)

David's approach may not be the best long term, but provided it's not
totally broken (I don't know qemu locking well enough to judge how
dangerous it is) then it might be a "good enough" first step until we
come up with something better ?

No, it's definitely not good enough.  Dropping the global mutex in random places
is asking for worlds of hurt.

If this is really important, then we need some sort of cancellation API to go
along with map/unmap although I doubt that's really possible.

MMIO/PIO operations cannot block.

Well, there's a truckload of cases in real HW where an MMIO/PIO read is
used to synchronize some sort of HW operation.... I suppose nothing that
involves blocking at this stage in qemu but I would be careful with your
expectations here... writes are usually pipelined but blocking on a read
response does make a lot of sense.

Blocking on an MMIO/PIO request effectively freezes a CPU. All sorts of badness results from that. Best case scenario, you trigger soft lockup warnings.

In any case, for the problem at hand, I can just drop the wait for now
and maybe just print a warning if I see an existing map.

We still need some kind of either locking or barrier to simply ensure
that the updates to the TCE table are visible to other processors but
that can be done in the backend.

But I wouldn't just forget about the issue, it's going to come back and

I think working out the exact semantics of what we need to do is absolutely important. But I think you're taking an overly conservative approach to what we need to provide here.


Anthony Liguori



Anthony Liguori

The normal case will be that no map exist, ie, it will almost always be
a guest programming error to remove an iommu mapping while a device is
actively using it, so having this case be slow is probably a non-issue.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]