qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH] pci: Use PCI aliases when determining devic


From: Auger Eric
Subject: Re: [Qemu-devel] [RFC PATCH] pci: Use PCI aliases when determining device IOMMU address space
Date: Wed, 27 Mar 2019 12:46:41 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0

Hi Alex,

On 3/26/19 11:55 PM, Alex Williamson wrote:
> Conventional PCI buses pre-date requester IDs.  An IOMMU cannot
> distinguish by devfn & bus between devices in a conventional PCI
> topology and therefore we cannot assign them separate AddressSpaces.
> By taking this requester ID aliasing into account, QEMU better matches
> the bare metal behavior and restrictions, and enables shared
> AddressSpace configurations that are otherwise not possible with
> guest IOMMU support.
> 
> For the latter case, given any example where an IOMMU group on the
> host includes multiple devices:
> 
>   $ ls  /sys/kernel/iommu_groups/1/devices/
>   0000:00:01.0  0000:01:00.0  0000:01:00.1
> 
> If we incorporate a vIOMMU into the VM configuration, we're restricted
> that we can only assign one of the endpoints to the guest because a
> second endpoint will attempt to use a different AddressSpace.  VFIO
> only supports IOMMU group level granularity at the container level,
> preventing this second endpoint from being assigned:
> 
> qemu-system-x86_64 -machine q35... \
>   -device intel-iommu,intremap=on \
>   -device pcie-root-port,addr=1e.0,id=pcie.1 \
>   -device vfio-pci,host=1:00.0,bus=pcie.1,addr=0.0,multifunction=on \
>   -device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1
> 
> qemu-system-x86_64: -device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1: vfio \
> 0000:01:00.1: group 1 used in multiple address spaces
> 
> However, when QEMU incorporates proper aliasing, we can make use of a
> PCIe-to-PCI bridge to mask the requester ID, resulting in a hack that
> provides the downstream devices with the same AddressSpace, ex:
> 
> qemu-system-x86_64 -machine q35... \
>   -device intel-iommu,intremap=on \
>   -device pcie-pci-bridge,addr=1e.0,id=pci.1 \
>   -device vfio-pci,host=1:00.0,bus=pci.1,addr=1.0,multifunction=on \
>   -device vfio-pci,host=1:00.1,bus=pci.1,addr=1.1
> 
> While the utility of this hack may be limited, this AddressSpace
> aliasing is the correct behavior for QEMU to emulate bare metal.
> 
> Signed-off-by: Alex Williamson <address@hidden>
> ---
>  hw/pci/pci.c |   33 +++++++++++++++++++++++++++++++--
>  1 file changed, 31 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 35451c1e9987..38467e676f1f 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2594,12 +2594,41 @@ AddressSpace 
> *pci_device_iommu_address_space(PCIDevice *dev)
>  {
>      PCIBus *bus = pci_get_bus(dev);
>      PCIBus *iommu_bus = bus;
> +    uint8_t devfn = dev->devfn;
>  
>      while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) {
> -        iommu_bus = pci_get_bus(iommu_bus->parent_dev);
> +        PCIBus *parent_bus = pci_get_bus(iommu_bus->parent_dev);
> +
> +        /*
> +         * Determine which requester ID alias should be used for the device
> +         * based on the PCI topology.  There are no requester IDs on 
> convetional
conventional
> +         * PCI buses, therefore we push the alias up to the parent on each 
> non-
> +         * express bus.  Which alias we use depends on whether this is a 
> legacy
> +         * PCI bridge or PCIe-to-PCI/X bridge as in chapter 2.3 of the 
> PCIe-to-
> +         * PCI bridge spec.  Note that we cannot use pci_requester_id() here
> +         * because the resulting BDF depends on the secondary bridge register
Didn't you mean secondary bus number register?
> +         * programming.  We also cannot lookup the PCIBus from the bus number
> +         * at this point for the iommu_fn.  Also, requester_id_cache is the
> +         * alias to the root bus, which is usually, but not necessarily 
> always
> +         * where we'll find our iommu_fn.
> +         */
> +        if (!pci_bus_is_express(iommu_bus)) {
> +            PCIDevice *parent = iommu_bus->parent_dev;
> +
> +            if (pci_is_express(parent) &&
> +                pcie_cap_get_type(parent) == PCI_EXP_TYPE_PCI_BRIDGE) {
> +                devfn = PCI_DEVFN(0, 0);
> +                bus = iommu_bus;
> +            } else {
> +                devfn = parent->devfn;
> +                bus = parent_bus;
> +            }
> +        }
> +
> +        iommu_bus = parent_bus;
>      }
>      if (iommu_bus && iommu_bus->iommu_fn) {
> -        return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, dev->devfn);
> +        return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn);
I think it would make sense to comment this iommu_fn() callback's role
somewhere.
>      }
>      return &address_space_memory;
>  }
> 
>

Thanks

Eric



reply via email to

[Prev in Thread] Current Thread [Next in Thread]