[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v5 07/11] vfio: Add guest side IOMMU support
From: |
Alexey Kardashevskiy |
Subject: |
Re: [Qemu-devel] [PATCH v5 07/11] vfio: Add guest side IOMMU support |
Date: |
Fri, 28 Mar 2014 15:49:24 +1100 |
User-agent: |
Mozilla/5.0 (X11; Linux i686 on x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 |
On 03/22/2014 01:17 AM, Alex Williamson wrote:
> On Fri, 2014-03-21 at 18:59 +1100, Alexey Kardashevskiy wrote:
>> On 03/20/2014 06:57 AM, Alex Williamson wrote:
>>> On Wed, 2014-03-12 at 16:52 +1100, Alexey Kardashevskiy wrote:
>>>> From: David Gibson <address@hidden>
>>>>
>>>> This patch uses the new IOMMU notifiers to allow VFIO pass through devices
>>>> to work with guest side IOMMUs, as long as the host-side VFIO iommu has
>>>> sufficient capability and granularity to match the guest side. This works
>>>> by tracking all map and unmap operations on the guest IOMMU using the
>>>> notifiers, and mirroring them into VFIO.
>>>>
>>>> There are a number of FIXMEs, and the scheme involves rather more notifier
>>>> structures than I'd like, but it should make for a reasonable proof of
>>>> concept.
>>>>
>>>> Signed-off-by: David Gibson <address@hidden>
>>>> Signed-off-by: Alexey Kardashevskiy <address@hidden>
>>>>
>>>> ---
>>>> Changes:
>>>> v4:
>>>> * fixed list objects naming
>>>> * vfio_listener_region_add() reworked to call memory_region_ref() from one
>>>> place only, it is also easier to review the changes
>>>> * fixes boundary check not to fail on sections == 2^64 bytes,
>>>> the "vfio: Fix debug output for int128 values" patch is required;
>>>> this obsoletes the "[PATCH v3 0/3] vfio: fixes for better support
>>>> for 128 bit memory section sizes" patch proposal
>>>> ---
>>>> hw/misc/vfio.c | 126
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>>> 1 file changed, 120 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
>>>> index 038010b..4f6f5da 100644
>>>> --- a/hw/misc/vfio.c
>>>> +++ b/hw/misc/vfio.c
>>>> @@ -159,10 +159,18 @@ typedef struct VFIOContainer {
>>>> };
>>>> void (*release)(struct VFIOContainer *);
>>>> } iommu_data;
>>>> + QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>>>> QLIST_HEAD(, VFIOGroup) group_list;
>>>> QLIST_ENTRY(VFIOContainer) next;
>>>> } VFIOContainer;
>>>>
>>>> +typedef struct VFIOGuestIOMMU {
>>>> + VFIOContainer *container;
>>>> + MemoryRegion *iommu;
>>>> + Notifier n;
>>>> + QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
>>>> +} VFIOGuestIOMMU;
>>>> +
>>>> /* Cache of MSI-X setup plus extra mmap and memory region for split BAR
>>>> map */
>>>> typedef struct VFIOMSIXInfo {
>>>> uint8_t table_bar;
>>>> @@ -2241,8 +2249,9 @@ static int vfio_dma_map(VFIOContainer *container,
>>>> hwaddr iova,
>>>>
>>>> static bool vfio_listener_skipped_section(MemoryRegionSection *section)
>>>> {
>>>> - return !memory_region_is_ram(section->mr) ||
>>>> - /*
>>>> + return (!memory_region_is_ram(section->mr) &&
>>>> + !memory_region_is_iommu(section->mr)) ||
>>>> + /*
>>>
>>> White space damage
>>>
>>>> * Sizing an enabled 64-bit BAR can cause spurious mappings to
>>>> * addresses in the upper part of the 64-bit address space.
>>>> These
>>>> * are never accessed by the CPU and beyond the address width
>>>> of
>>>> @@ -2251,6 +2260,61 @@ static bool
>>>> vfio_listener_skipped_section(MemoryRegionSection *section)
>>>> section->offset_within_address_space & (1ULL << 63);
>>>> }
>>>>
>>>> +static void vfio_iommu_map_notify(Notifier *n, void *data)
>>>> +{
>>>> + VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
>>>> + VFIOContainer *container = giommu->container;
>>>> + IOMMUTLBEntry *iotlb = data;
>>>> + MemoryRegion *mr;
>>>> + hwaddr xlat;
>>>> + hwaddr len = iotlb->addr_mask + 1;
>>>> + void *vaddr;
>>>> + int ret;
>>>> +
>>>> + DPRINTF("iommu map @ %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
>>>> + iotlb->iova, iotlb->iova + iotlb->addr_mask);
>>>> +
>>>> + /*
>>>> + * The IOMMU TLB entry we have just covers translation through
>>>> + * this IOMMU to its immediate target. We need to translate
>>>> + * it the rest of the way through to memory.
>>>> + */
>>>> + mr = address_space_translate(&address_space_memory,
>>>> + iotlb->translated_addr,
>>>> + &xlat, &len, iotlb->perm & IOMMU_WO);
>>>
>>> Write-only? Is this supposed to be read-write to mask just 2 bits?
>>
>>
>> The last parameter of address_space_translate() bool is_write. So I do not
>> really understand the problem here.
>
> Oops, my bad, I didn't look at what address_space_translate() used that
> for. Ok.
>
>>>> + if (!memory_region_is_ram(mr)) {
>>>> + DPRINTF("iommu map to non memory area %"HWADDR_PRIx"\n",
>>>> + xlat);
>>>> + return;
>>>> + }
>>>> + if (len & iotlb->addr_mask) {
>>>> + DPRINTF("iommu has granularity incompatible with target AS\n");
>>>
>>> Is this possible? Assuming len is initially a power-of-2, would the
>>> translate function change it? Maybe worth a comment to explain.
>>
>>
>> Oh. address_space_translate() actually changes @len to min(len,
>> TARGET_PAGE_SIZE) and TARGET_PAGE_SIZE is hardcoded to 4K. So far it was ok
>> but lately I have been implementing a huge DMA window (plus one
>> sPAPRTCETable and one VFIOGuestIOMMU objects) which currently operates with
>> 16MB pages (can do 64K pages too) and now this "granularity incompatible"
>> is happening.
>>
>> I disabled that check but I need to think of better fix...
>>
>> Adding Paolo to cc, may be he picks the context and gives good piece of
>> advise :)
>>
>>
>>
>>>
>>>> + return;
>>>> + }
>>>> +
>>>> + vaddr = memory_region_get_ram_ptr(mr) + xlat;
>>>
>>> This lookup isn't free and the unmap path doesn't need it, maybe move
>>> the variable and lookup into the first branch below?
>>>
>>>> +
>>>> + if (iotlb->perm != IOMMU_NONE) {
>>>> + ret = vfio_dma_map(container, iotlb->iova,
>>>> + iotlb->addr_mask + 1, vaddr,
>>>> + !(iotlb->perm & IOMMU_WO) || mr->readonly);
>>>> + if (ret) {
>>>> + error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
>>>> + "0x%"HWADDR_PRIx", %p) = %d (%m)",
>>>> + container, iotlb->iova,
>>>> + iotlb->addr_mask + 1, vaddr, ret);
>>>> + }
>>>> + } else {
>>>> + ret = vfio_dma_unmap(container, iotlb->iova, iotlb->addr_mask +
>>>> 1);
>>>> + if (ret) {
>>>> + error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
>>>> + "0x%"HWADDR_PRIx") = %d (%m)",
>>>> + container, iotlb->iova,
>>>> + iotlb->addr_mask + 1, ret);
>>>> + }
>>>> + }
>>>> +}
>>>> +
>>>> static void vfio_listener_region_add(MemoryListener *listener,
>>>> MemoryRegionSection *section)
>>>> {
>>>> @@ -2261,8 +2325,6 @@ static void vfio_listener_region_add(MemoryListener
>>>> *listener,
>>>> void *vaddr;
>>>> int ret;
>>>>
>>>> - assert(!memory_region_is_iommu(section->mr));
>>>> -
>>>> if (vfio_listener_skipped_section(section)) {
>>>> DPRINTF("SKIPPING region_add %"HWADDR_PRIx" - %"PRIx64"\n",
>>>> section->offset_within_address_space,
>>>> @@ -2286,15 +2348,47 @@ static void
>>>> vfio_listener_region_add(MemoryListener *listener,
>>>> return;
>>>> }
>>>>
>>>> + memory_region_ref(section->mr);
>>>> +
>>>> + if (memory_region_is_iommu(section->mr)) {
>>>> + VFIOGuestIOMMU *giommu;
>>>> +
>>>> + DPRINTF("region_add [iommu] %"HWADDR_PRIx" - %"HWADDR_PRIx"\n",
>>>> + iova, int128_get64(int128_sub(llend, int128_one())));
>>>> + /*
>>>> + * FIXME: We should do some checking to see if the
>>>> + * capabilities of the host VFIO IOMMU are adequate to model
>>>> + * the guest IOMMU
>>>> + *
>>>> + * FIXME: This assumes that the guest IOMMU is empty of
>>>> + * mappings at this point - we should either enforce this, or
>>>> + * loop through existing mappings to map them into VFIO.
>>>> + *
>>>> + * FIXME: For VFIO iommu types which have KVM acceleration to
>>>> + * avoid bouncing all map/unmaps through qemu this way, this
>>>> + * would be the right place to wire that up (tell the KVM
>>>> + * device emulation the VFIO iommu handles to use).
>>>> + */
>>>
>>> That's a lot of FIXMEs... The second one in particular looks like it
>>> needs to expand a bit on why this is likely a valid assumption. The
>>> last one is more of a TODO than a FIXME.
>>>
>>>> + giommu = g_malloc0(sizeof(*giommu));
>>>> + giommu->iommu = section->mr;
>>>> + giommu->container = container;
>>>> + giommu->n.notify = vfio_iommu_map_notify;
>>>> + QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
>>>> + memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
>>>> +
>>>> + return;
>>>> + }
>>>> +
>>>> + /* Here we assume that memory_region_is_ram(section->mr)==true */
>>>> +
>>>> end = int128_get64(llend);
>>>> vaddr = memory_region_get_ram_ptr(section->mr) +
>>>> section->offset_within_region +
>>>> (iova - section->offset_within_address_space);
>>>>
>>>> - DPRINTF("region_add %"HWADDR_PRIx" - %"HWADDR_PRIx" [%p]\n",
>>>> + DPRINTF("region_add [ram] %"HWADDR_PRIx" - %"HWADDR_PRIx" [%p]\n",
>>>> iova, end - 1, vaddr);
>>>>
>>>> - memory_region_ref(section->mr);
>>>> ret = vfio_dma_map(container, iova, end - iova, vaddr,
>>>> section->readonly);
>>>> if (ret) {
>>>> error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
>>>> @@ -2338,6 +2432,26 @@ static void vfio_listener_region_del(MemoryListener
>>>> *listener,
>>>> return;
>>>> }
>>>>
>>>> + if (memory_region_is_iommu(section->mr)) {
>>>> + VFIOGuestIOMMU *giommu;
>>>> +
>>>> + QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
>>>> + if (giommu->iommu == section->mr) {
>>>> + memory_region_unregister_iommu_notifier(&giommu->n);
>>>> + QLIST_REMOVE(giommu, giommu_next);
>>>> + g_free(giommu);
>>>> + break;
>>>> + }
>>>> + }
>>>> +
>>>> + /*
>>>> + * FIXME: We assume the one big unmap below is adequate to
>>>> + * remove any individual page mappings in the IOMMU which
>>>> + * might have been copied into VFIO. That may not be true for
>>>> + * all IOMMU types
>>>> + */
>>>
>>> We assume this because the IOVA that gets unmapped is the same
>>> regardless of whether a guest IOMMU is present?
>>
>>
>> What exactly is meant by "guest IOMMU is present"? Doing the second DMA
>> window, now I am really confused about terminology :(
>
> The confusion for me is that add_region initializes the giommu and all
> the DMA mapping through VFIO is done in the notifier for the giommu.
> It's therefore asymmetric that add_region doesn't vfio_dma_map anything,
> but region_del does vfio_dma_unmap, which is the basis of my question.
> I thought this comment was trying to address why that is, but apparently
> it's something else entirely, so it would be nice to understand why this
> doesn't return() and decode a bit more clearly what the FIXME is trying
> to say. Thanks,
I do not mind extending that comment but need some assistance.
My understanding is that:
===
region_del is called on memory region removal so this particular window is
not going to be used anymore and this is the only place where such cleanup
could be done.
===
Asymmetry is here but it does not look terrible.
> Alex
>
>>>> + }
>>>> +
>>>> iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
>>>> end = (section->offset_within_address_space +
>>>> int128_get64(section->size)) &
>>>> TARGET_PAGE_MASK;
--
Alexey
- [Qemu-devel] [PATCH v5 04/11] vfio: rework to have error paths, (continued)
- [Qemu-devel] [PATCH v5 04/11] vfio: rework to have error paths, Alexey Kardashevskiy, 2014/03/12
- [Qemu-devel] [PATCH v5 09/11] spapr vfio: add vfio_container_spapr_get_info(), Alexey Kardashevskiy, 2014/03/12
- [Qemu-devel] [PATCH v5 07/11] vfio: Add guest side IOMMU support, Alexey Kardashevskiy, 2014/03/12
- Re: [Qemu-devel] [PATCH v5 07/11] vfio: Add guest side IOMMU support, Alexey Kardashevskiy, 2014/03/21
- Re: [Qemu-devel] [PATCH v5 07/11] vfio: Add guest side IOMMU support, Alex Williamson, 2014/03/21
- Re: [Qemu-devel] [PATCH v5 07/11] vfio: Add guest side IOMMU support, Paolo Bonzini, 2014/03/21
- Re: [Qemu-devel] [PATCH v5 07/11] vfio: Add guest side IOMMU support,
Alexey Kardashevskiy <=
- Re: [Qemu-devel] [PATCH v5 07/11] vfio: Add guest side IOMMU support, Alex Williamson, 2014/03/31
[Qemu-devel] [PATCH v5 02/11] int128: add int128_exts64(), Alexey Kardashevskiy, 2014/03/12
[Qemu-devel] [PATCH v5 08/11] spapr-iommu: add SPAPR VFIO IOMMU device, Alexey Kardashevskiy, 2014/03/12
[Qemu-devel] [PATCH v5 11/11] spapr-vfio: enable for spapr, Alexey Kardashevskiy, 2014/03/12
[Qemu-devel] [PATCH v5 10/11] spapr-vfio: add spapr-pci-vfio-host-bridge to support vfio, Alexey Kardashevskiy, 2014/03/12