qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] [Qemu-devel] [RFC 0/8] VIRTIO-IOMMU device


From: Jason Wang
Subject: Re: [Qemu-arm] [Qemu-devel] [RFC 0/8] VIRTIO-IOMMU device
Date: Thu, 8 Jun 2017 16:35:55 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1



On 2017年06月07日 18:19, Jean-Philippe Brucker wrote:
Hi Jason,

On 07/06/17 10:17, Jason Wang wrote:
On 2017年06月07日 16:35, Eric Auger wrote:
This series implements the virtio-iommu device. This is a proof
of concept based on the virtio-iommu specification written by
Jean-Philippe Brucker [1]. This was tested with a guest using
the virtio-iommu driver [2] and exposed with a virtio-net-pci
using dma ops.

The device gets instantiated using the "-device virtio-iommu-device"
option. It currently works with ARM virt machine only as the machine
must handle the dt binding between the virtio-mmio "iommu" node and
the PCI host bridge node. ACPI booting is not yet supported.

This should allow to start some benchmarking activities against
pure emulated IOMMU (especially ARM SMMU).
Yes, it would be also interesting to compare it with intel IOMMU. Actually
the core function is similar to the subset of intel one with CM enabled.
Since each map and unmap requires a command, it would be very slow for
dynamic mappings. I wonder whether or not we can do any optimization on this.
In general we will have to send the same number of map/unmap requests than
the number of invalidations needed for an emulated IOMMU such as the Intel
one (if I understand correctly with CM there are invalidations both on map
and unmap, to avoid trapping the page tables). Using virtio allows to
reduce the number of round-trips to the host, by batching map/unmap
requests where possible.

When QI is enabled, intel IOMMU can also support batching. And it can even allow asynchronous completion of the operation by noting using wait descriptor.

Adding vhost-iommu in the host could further
reduce the latency of map/unmap requests.

Probably but then we need a way to co-operate with userspace device emulation codes.


To actually reduce the number of requests, I see two possible
optimizations (loosely described in [1]), both requiring invasive changes.

* Relaxed (insecure) mode, where the guest batches unmap request or
doesn't send them at all. Map will override existing mappings if
necessary. You end up sending far less unmap requests, but there is a
vulnerability window where devices can access stale mappings, so you have
to trust your peripherals. I believe the x86 IOMMU drivers in Linux
already allow this.

Yes and actually it's the default behavior. What's more, it will try to do a domain invalidation if there's too many pending invalidation.


* Page table handover, which is a new mode orthogonal to map/unmap. This
uses nested translation - the guest has one set of page tables for
gva->gpa and the host has one set for gpa->hpa. After setup, the guest
populates the page tables and only sends invalidation requests, no map. I
think that with the Intel IOMMU this would only be possible with PASID
traffic. But nested translation will inherently be slower than "classic"
mode, so it might end up being overall slower than map/unmap, if there is
a lot of TLB invalidation and trashing. This mode is mostly useful for SVM
virtualization.

An advantage for page table is that its searching is faster than current tree based algorithm, this can help in the case of dynamic mappings more or less.

Thanks


Thanks,
Jean






reply via email to

[Prev in Thread] Current Thread [Next in Thread]