qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [RFC PATCH 00/13] VT-d replay and misc cleanup


From: Peter Xu
Subject: [Qemu-devel] [RFC PATCH 00/13] VT-d replay and misc cleanup
Date: Tue, 6 Dec 2016 18:36:15 +0800

This RFC series is a continue work for Aviv B.D.'s vfio enablement
series with vt-d. Aviv has done a great job there, and what we still
lack there are mostly the following:

(1) VFIO got duplicated IOTLB notifications due to splitted VT-d IOMMU
    memory region.

(2) VT-d still haven't provide a correct replay() mechanism (e.g.,
    when IOMMU domain switches, things will broke).

Here I'm trying to solve the above two issues.

(1) is solved by patch 7, (2) is solved by patch 11-12.

Basically it contains the following:

patch 1:    picked up from Jason's vhost DMAR series, which is a bugfix

patch 2-6:  Cleanups/Enhancements for existing vt-d codes (please see
            specific commit message for details, there are patches
            that I thought may be suitable for 2.8 as well, but looks
            like it's too late)

patch 7:    Solve the issue that vfio is notified more than once for
            IOTLB notifications with Aviv's patches

patch 8-10: Some trivial memory APIs added for further patches, and
            add customize replay() support for MemoryRegion (I see
            Aviv's latest v7 contains similar replay, I can rebase
            onto that, merely the same thing)

patch 11:   Provide a valid vt-d replay() callback, using page walk

patch 12:   Enable the domain switch support - we replay() when
            context entry got invalidated

patch 13:   Enhancement for existing invalidation notification,
            instead of using translate() for each page, we leverage
            the new vtd_page_walk() interface, which should be faster.

I would glad to hear about any review comments for above patches
(especially patch 8-13, which is the main part of this series),
especially any issue I missed in the series.

=========
Test Done
=========

Build test passed for x86_64/arm/ppc64.

Simply tested with x86_64, assigning two PCI devices to a single VM,
boot the VM using:

bin=x86_64-softmmu/qemu-system-x86_64
$bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \
     -device intel-iommu,intremap=on,eim=off,cache-mode=on \
     -netdev user,id=net0,hostfwd=tcp::5555-:22 \
     -device virtio-net-pci,netdev=net0 \
     -device vfio-pci,host=03:00.0 \
     -device vfio-pci,host=02:00.0 \
     -trace events=".trace.vfio" \
     /var/lib/libvirt/images/vm1.qcow2

pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio
vtd_page_walk*
vtd_replay*
vtd_inv_desc*

Then, in the guest, run the following tool:

  
https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind-group/vfio-bind-group.c

With parameter:

  ./vfio-bind-group 00:03.0 00:04.0

Check host side trace log, I can see pages are replayed and mapped in
00:04.0 device address space, like:

...
vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x301 lo 0x3be77001
vtd_page_walk Page walk for ce (0x301, 0x3be77001) iova range 0x0 - 0x8000000000
vtd_page_walk_level Page walk (base=0x3be77000, level=3) iova range 0x0 - 
0x8000000000
vtd_page_walk_level Page walk (base=0x3c88a000, level=2) iova range 0x0 - 
0x40000000
vtd_page_walk_level Page walk (base=0x366cb000, level=1) iova range 0x0 - 
0x200000
vtd_page_walk_one Page walk detected map level 0x1 iova 0x0 -> gpa 0x366cb000 
mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 -> gpa 
0x366cb000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x2000 -> gpa 
0x366cb000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 -> gpa 
0x366cb000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x4000 -> gpa 
0x366cb000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 -> gpa 
0x366cb000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x6000 -> gpa 
0x366cb000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa 
0x366cb000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x8000 -> gpa 
0x366cb000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa 
0x366cb000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0xa000 -> gpa 
0x366cb000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0xb000 -> gpa 
0x366cb000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0xc000 -> gpa 
0x366cb000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0xd000 -> gpa 
0x366cb000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0xe000 -> gpa 
0x366cb000 mask 0xfff perm 3
...

=========
Todo List
=========

- error reporting for the assigned devices (as Tianyu has mentioned)

- per-domain address-space: A better solution in the future may be -
  we maintain one address space per IOMMU domain in the guest (so
  multiple devices can share a same address space if they are sharing
  the same IOMMU domains in the guest), rather than one address space
  per device (which is current implementation of vt-d). However that's
  a step further than this series, and let's see whether we can first
  provide a workable version of device assignment with vt-d
  protection.

- more to come...

Thanks,

Jason Wang (1):
  intel_iommu: allocate new key when creating new address space

Peter Xu (12):
  intel_iommu: simplify irq region translation
  intel_iommu: renaming gpa to iova where proper
  intel_iommu: fix trace for inv desc handling
  intel_iommu: fix trace for addr translation
  intel_iommu: vtd_slpt_level_shift check level
  memory: add section range info for IOMMU notifier
  memory: provide iommu_replay_all()
  memory: introduce memory_region_notify_one()
  memory: add MemoryRegionIOMMUOps.replay() callback
  intel_iommu: provide its own replay() callback
  intel_iommu: do replay when context invalidate
  intel_iommu: use page_walk for iotlb inv notify

 hw/i386/intel_iommu.c | 521 ++++++++++++++++++++++++++++++++------------------
 hw/i386/trace-events  |  27 +++
 hw/vfio/common.c      |   7 +-
 include/exec/memory.h |  30 +++
 memory.c              |  42 +++-
 5 files changed, 432 insertions(+), 195 deletions(-)

-- 
2.7.4




reply via email to

[Prev in Thread] Current Thread [Next in Thread]