qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH] pci: Use PCI aliases when determining devic


From: Alex Williamson
Subject: Re: [Qemu-devel] [RFC PATCH] pci: Use PCI aliases when determining device IOMMU address space
Date: Wed, 27 Mar 2019 13:19:44 -0600

On Wed, 27 Mar 2019 19:07:44 +0100
Auger Eric <address@hidden> wrote:

> Hi Alex,
> 
> On 3/27/19 7:02 PM, Alex Williamson wrote:
> > On Wed, 27 Mar 2019 12:46:41 +0100
> > Auger Eric <address@hidden> wrote:
> >   
> >> Hi Alex,
> >>
> >> On 3/26/19 11:55 PM, Alex Williamson wrote:  
> >>> Conventional PCI buses pre-date requester IDs.  An IOMMU cannot
> >>> distinguish by devfn & bus between devices in a conventional PCI
> >>> topology and therefore we cannot assign them separate AddressSpaces.
> >>> By taking this requester ID aliasing into account, QEMU better matches
> >>> the bare metal behavior and restrictions, and enables shared
> >>> AddressSpace configurations that are otherwise not possible with
> >>> guest IOMMU support.
> >>>
> >>> For the latter case, given any example where an IOMMU group on the
> >>> host includes multiple devices:
> >>>
> >>>   $ ls  /sys/kernel/iommu_groups/1/devices/
> >>>   0000:00:01.0  0000:01:00.0  0000:01:00.1
> >>>
> >>> If we incorporate a vIOMMU into the VM configuration, we're restricted
> >>> that we can only assign one of the endpoints to the guest because a
> >>> second endpoint will attempt to use a different AddressSpace.  VFIO
> >>> only supports IOMMU group level granularity at the container level,
> >>> preventing this second endpoint from being assigned:
> >>>
> >>> qemu-system-x86_64 -machine q35... \
> >>>   -device intel-iommu,intremap=on \
> >>>   -device pcie-root-port,addr=1e.0,id=pcie.1 \
> >>>   -device vfio-pci,host=1:00.0,bus=pcie.1,addr=0.0,multifunction=on \
> >>>   -device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1
> >>>
> >>> qemu-system-x86_64: -device vfio-pci,host=1:00.1,bus=pcie.1,addr=0.1: 
> >>> vfio \
> >>> 0000:01:00.1: group 1 used in multiple address spaces
> >>>
> >>> However, when QEMU incorporates proper aliasing, we can make use of a
> >>> PCIe-to-PCI bridge to mask the requester ID, resulting in a hack that
> >>> provides the downstream devices with the same AddressSpace, ex:
> >>>
> >>> qemu-system-x86_64 -machine q35... \
> >>>   -device intel-iommu,intremap=on \
> >>>   -device pcie-pci-bridge,addr=1e.0,id=pci.1 \
> >>>   -device vfio-pci,host=1:00.0,bus=pci.1,addr=1.0,multifunction=on \
> >>>   -device vfio-pci,host=1:00.1,bus=pci.1,addr=1.1
> >>>
> >>> While the utility of this hack may be limited, this AddressSpace
> >>> aliasing is the correct behavior for QEMU to emulate bare metal.
> >>>
> >>> Signed-off-by: Alex Williamson <address@hidden>
> >>> ---
> >>>  hw/pci/pci.c |   33 +++++++++++++++++++++++++++++++--
> >>>  1 file changed, 31 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >>> index 35451c1e9987..38467e676f1f 100644
> >>> --- a/hw/pci/pci.c
> >>> +++ b/hw/pci/pci.c
> >>> @@ -2594,12 +2594,41 @@ AddressSpace 
> >>> *pci_device_iommu_address_space(PCIDevice *dev)
> >>>  {
> >>>      PCIBus *bus = pci_get_bus(dev);
> >>>      PCIBus *iommu_bus = bus;
> >>> +    uint8_t devfn = dev->devfn;
> >>>  
> >>>      while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) {
> >>> -        iommu_bus = pci_get_bus(iommu_bus->parent_dev);
> >>> +        PCIBus *parent_bus = pci_get_bus(iommu_bus->parent_dev);
> >>> +
> >>> +        /*
> >>> +         * Determine which requester ID alias should be used for the 
> >>> device
> >>> +         * based on the PCI topology.  There are no requester IDs on 
> >>> convetional    
> >> conventional  
> > 
> > Oops, fixed
> >   
> >>> +         * PCI buses, therefore we push the alias up to the parent on 
> >>> each non-
> >>> +         * express bus.  Which alias we use depends on whether this is a 
> >>> legacy
> >>> +         * PCI bridge or PCIe-to-PCI/X bridge as in chapter 2.3 of the 
> >>> PCIe-to-
> >>> +         * PCI bridge spec.  Note that we cannot use pci_requester_id() 
> >>> here
> >>> +         * because the resulting BDF depends on the secondary bridge 
> >>> register    
> >> Didn't you mean secondary bus number register?  
> > 
> > Yes, fixed
> >   
> >>> +         * programming.  We also cannot lookup the PCIBus from the bus 
> >>> number
> >>> +         * at this point for the iommu_fn.  Also, requester_id_cache is 
> >>> the
> >>> +         * alias to the root bus, which is usually, but not necessarily 
> >>> always
> >>> +         * where we'll find our iommu_fn.
> >>> +         */
> >>> +        if (!pci_bus_is_express(iommu_bus)) {
> >>> +            PCIDevice *parent = iommu_bus->parent_dev;
> >>> +
> >>> +            if (pci_is_express(parent) &&
> >>> +                pcie_cap_get_type(parent) == PCI_EXP_TYPE_PCI_BRIDGE) {
> >>> +                devfn = PCI_DEVFN(0, 0);
> >>> +                bus = iommu_bus;
> >>> +            } else {
> >>> +                devfn = parent->devfn;
> >>> +                bus = parent_bus;
> >>> +            }
> >>> +        }
> >>> +
> >>> +        iommu_bus = parent_bus;
> >>>      }
> >>>      if (iommu_bus && iommu_bus->iommu_fn) {
> >>> -        return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, 
> >>> dev->devfn);
> >>> +        return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn); 
> >>>    
> >> I think it would make sense to comment this iommu_fn() callback's role
> >> somewhere.  
> > 
> > While I agree, it seems a bit out of scope of this patch.  Or are you
> > suggesting that this patch fundamentally changing the role rather than
> > trying to make it work as intended?  
> 
> The last question is what I tried to figure out when reviewing the patch
> as you change the @bus and @devfn arg values passed to the cb. Given the
> lack of documentation about the role of this function, it is not obvious
> to see if the patch does not break anything without reading the cb
> implementations.

AIUI, the original semantics are simply "return an AddressSpace for this
{PCIBus, devfn}".  I don't think that fundamentally changes, it's just
that we know the bus hosting the IOMMU (ie. the one with iommu_fn) and
we know the common algorithm for determining the effective requester ID
the IOMMU will see for that device from its bus.  We're therefore
simply providing the topology defined alias to the device rather than
the device itself.  We're not actually changing the semantics of the
iommu_fn, we're just limiting the set of {PCIBus, devfn}s that we'll
request.  It seems like more of a semantic change to make the callee
responsible for determining the visibility of the requested device.
Thanks,

Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]