qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC 00/18] vfio: Adopt iommufd


From: Yi Liu
Subject: Re: [RFC 00/18] vfio: Adopt iommufd
Date: Mon, 18 Apr 2022 20:09:31 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0 Thunderbird/91.7.0

Hi Kevin,

On 2022/4/18 16:49, Tian, Kevin wrote:
From: Liu, Yi L <yi.l.liu@intel.com>
Sent: Thursday, April 14, 2022 6:47 PM

With the introduction of iommufd[1], the linux kernel provides a generic
interface for userspace drivers to propagate their DMA mappings to kernel
for assigned devices. This series does the porting of the VFIO devices
onto the /dev/iommu uapi and let it coexist with the legacy implementation.
Other devices like vpda, vfio mdev and etc. are not considered yet.

vfio mdev has no special support in Qemu. Just that it's not supported
by iommufd yet thus can only be operated in legacy container interface at
this point. Later once it's supported by the kernel suppose no additional
enabling work is required for mdev in Qemu.

yes. will make it more precise in next version.


For vfio devices, the new interface is tied with device fd and iommufd
as the iommufd solution is device-centric. This is different from legacy
vfio which is group-centric. To support both interfaces in QEMU, this
series introduces the iommu backend concept in the form of different
container classes. The existing vfio container is named legacy container
(equivalent with legacy iommu backend in this series), while the new
iommufd based container is named as iommufd container (may also be
mentioned
as iommufd backend in this series). The two backend types have their own
way to setup secure context and dma management interface. Below diagram
shows how it looks like with both BEs.

                     VFIO                           AddressSpace/Memory
     +-------+  +----------+  +-----+  +-----+
     |  pci  |  | platform |  |  ap |  | ccw |
     +---+---+  +----+-----+  +--+--+  +--+--+     +----------------------+
         |           |           |        |        |   AddressSpace       |
         |           |           |        |        +------------+---------+
     +---V-----------V-----------V--------V----+               /
     |           VFIOAddressSpace              | <------------+
     |                  |                      |  MemoryListener
     |          VFIOContainer list             |
     +-------+----------------------------+----+
             |                            |
             |                            |
     +-------V------+            +--------V----------+
     |   iommufd    |            |    vfio legacy    |
     |  container   |            |     container     |
     +-------+------+            +--------+----------+
             |                            |
             | /dev/iommu                 | /dev/vfio/vfio
             | /dev/vfio/devices/vfioX    | /dev/vfio/$group_id
  Userspace  |                            |

===========+============================+=======================
=========
  Kernel     |  device fd                 |
             +---------------+            | group/container fd
             | (BIND_IOMMUFD |            | (SET_CONTAINER/SET_IOMMU)
             |  ATTACH_IOAS) |            | device fd
             |               |            |
             |       +-------V------------V-----------------+
     iommufd |       |                vfio                  |
(map/unmap  |       +---------+--------------------+-------+
  ioas_copy) |                 |                    | map/unmap
             |                 |                    |
      +------V------+    +-----V------+      +------V--------+
      | iommfd core |    |  device    |      |  vfio iommu   |
      +-------------+    +------------+      +---------------+

last row: s/iommfd/iommufd/

thanks. a typo.

overall this sounds a reasonable abstraction. Later when vdpa starts
supporting iommufd probably the iommufd BE will become even
smaller with more logic shareable between vfio and vdpa.

let's see if Jason Wang will give some idea. :-)


[Secure Context setup]
- iommufd BE: uses device fd and iommufd to setup secure context
               (bind_iommufd, attach_ioas)
- vfio legacy BE: uses group fd and container fd to setup secure context
                   (set_container, set_iommu)
[Device access]
- iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
- vfio legacy BE: device fd is retrieved from group fd ioctl
[DMA Mapping flow]
- VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
- VFIO populates DMA map/unmap via the container BEs
   *) iommufd BE: uses iommufd
   *) vfio legacy BE: uses container fd

This series qomifies the VFIOContainer object which acts as a base class

what does 'qomify' mean? I didn't find this word from dictionary...

for a container. This base class is derived into the legacy VFIO container
and the new iommufd based container. The base class implements generic
code
such as code related to memory_listener and address space management
whereas
the derived class implements callbacks that depend on the kernel user space

'the kernel user space'?

aha, just want to express different BE callbacks will use different user interface exposed by kernel. will refine the wording.


being used.

The selection of the backend is made on a device basis using the new
iommufd option (on/off/auto). By default the iommufd backend is selected
if supported by the host and by QEMU (iommufd KConfig). This option is
currently available only for the vfio-pci device. For other types of
devices, it does not yet exist and the legacy BE is chosen by default.

Test done:
- PCI and Platform device were tested

In this case PCI uses iommufd while platform device uses legacy?

For PCI, both legacy and iommufd were tested. The exploration kernel branch doesn't have the new device uapi for platform device, so I didn't test it.
But I remember Eric should have tested it with iommufd. Eric?

- ccw and ap were only compile-tested
- limited device hotplug test
- vIOMMU test run for both legacy and iommufd backends (limited tests)

This series was co-developed by Eric Auger and me based on the exploration
iommufd kernel[2], complete code of this series is available in[3]. As
iommufd kernel is in the early step (only iommufd generic interface is in
mailing list), so this series hasn't made the iommufd backend fully on par
with legacy backend w.r.t. features like p2p mappings, coherency tracking,

what does 'coherency tracking' mean here? if related to iommu enforce
snoop it is fully handled by the kernel so far. I didn't find any use of
VFIO_DMA_CC_IOMMU in current Qemu.

It's the kvm_group add/del stuffs.perhaps say kvm_group add/del equivalence
would be better?

live migration, etc. This series hasn't supported PCI devices without FLR
neither as the kernel doesn't support VFIO_DEVICE_PCI_HOT_RESET when
userspace
is using iommufd. The kernel needs to be updated to accept device fd list for
reset when userspace is using iommufd. Related work is in progress by
Jason[4].

TODOs:
- Add DMA alias check for iommufd BE (group level)
- Make pci.c to be BE agnostic. Needs kernel change as well to fix the
   VFIO_DEVICE_PCI_HOT_RESET gap
- Cleanup the VFIODevice fields as it's used in both BEs
- Add locks
- Replace list with g_tree
- More tests

Patch Overview:

- Preparation:
   0001-scripts-update-linux-headers-Add-iommufd.h.patch
   0002-linux-headers-Import-latest-vfio.h-and-iommufd.h.patch
   0003-hw-vfio-pci-fix-vfio_pci_hot_reset_result-trace-poin.patch
   0004-vfio-pci-Use-vbasedev-local-variable-in-vfio_realize.patch
   0005-vfio-common-Rename-VFIOGuestIOMMU-iommu-into-
iommu_m.patch

3-5 are pure cleanups which could be sent out separately

yes. may send later after checking with Eric. :-)

   0006-vfio-common-Split-common.c-into-common.c-container.c.patch

- Introduce container object and covert existing vfio to use it:
   0007-vfio-Add-base-object-for-VFIOContainer.patch
   0008-vfio-container-Introduce-vfio_attach-detach_device.patch
   0009-vfio-platform-Use-vfio_-attach-detach-_device.patch
   0010-vfio-ap-Use-vfio_-attach-detach-_device.patch
   0011-vfio-ccw-Use-vfio_-attach-detach-_device.patch
   0012-vfio-container-obj-Introduce-attach-detach-_device-c.patch
   0013-vfio-container-obj-Introduce-VFIOContainer-reset-cal.patch

- Introduce iommufd based container:
   0014-hw-iommufd-Creation.patch
   0015-vfio-iommufd-Implement-iommufd-backend.patch
   0016-vfio-iommufd-Add-IOAS_COPY_DMA-support.patch

- Add backend selection for vfio-pci:
   0017-vfio-as-Allow-the-selection-of-a-given-iommu-backend.patch
   0018-vfio-pci-Add-an-iommufd-option.patch

[1] https://lore.kernel.org/kvm/0-v1-e79cd8d168e8+6-
iommufd_jgg@nvidia.com/
[2] https://github.com/luxis1999/iommufd/tree/iommufd-v5.17-rc6
[3] https://github.com/luxis1999/qemu/tree/qemu-for-5.17-rc6-vm-rfcv1
[4] https://lore.kernel.org/kvm/0-v1-a8faf768d202+125dd-
vfio_mdev_no_group_jgg@nvidia.com/

Following is probably more relevant to [4]:

https://lore.kernel.org/all/10-v1-33906a626da1+16b0-vfio_kvm_no_group_jgg@nvidia.com/

absolutely.:-) thanks.

Thanks
Kevin

--
Regards,
Yi Liu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]