[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v4 1/2] pcie: Add a simple PCIe ACS (Access Cont
From: |
Knut Omang |
Subject: |
Re: [Qemu-devel] [PATCH v4 1/2] pcie: Add a simple PCIe ACS (Access Control Services) helper function |
Date: |
Thu, 14 Feb 2019 17:54:12 +0100 |
User-agent: |
Evolution 3.30.4 (3.30.4-1.fc29) |
On Thu, 2019-02-14 at 08:51 -0700, Alex Williamson wrote:
> On Thu, 14 Feb 2019 08:07:33 +0100
> Knut Omang <address@hidden> wrote:
>
> > On Wed, 2019-02-13 at 12:13 -0700, Alex Williamson wrote:
> > > On Wed, 13 Feb 2019 10:29:58 +0100
> > > Knut Omang <address@hidden> wrote:
> > >
> > > > Add a helper function to add PCIe capability for Access Control Services
> > > > (ACS)
> > >
> > > This is redundant to the commit title.
> > >
> > > > ACS support in the associated root port is needed to pass
> > > > through individual functions of a device to different VMs with VFIO
> > > > without Alex Williamson's pcie_acs_override kernel patch or similar
> > > > in the guest.
> > >
> > > This is overly subtle, to work backwards that individual functions
> > > (plural!) of a device (singular!) must imply a multifunction endpoint
> > > in the same hierarchy split to different L2 VMs. Perhaps I only
> > > finally realized this subtly on v4.
> > >
> > > > Single function devices, or multifunction devices
> > > > that all goes to the same VM works fine even without ACS, as VFIO
> > > > will avoid putting the root port itself into the IOMMU group
> > > > even without ACS support in the port.
> > >
> > > Also confusing and incorrectly states that a) VFIO is responsible for
> > > IOMMU grouping, it's not, and b) that the root port would not be
> > > included in such a group, it would. The latter was I thought the
> > > impetus for this series.
> >
> > that wasn't the intention but I can see that it looks that way now
> >
> > > > Multifunction endpoints may also implement an ACS capability,
> > > > only on function 0, and with more limited features.
> > >
> > > "only on function 0" is incorrect, each function of a multifunction
> > > device should (not must) implement an ACS capability if any of them do.
> > >
> > > Can't we just say something like:
> > >
> > > "Implementing an ACS capability on downstream ports and multifuction
> > > endpoints indicates isolation and IOMMU visibility to a finer
> > > granularity thereby creating smaller IOMMU groups in the guest and thus
> > > more flexibility in assigning endpoints to guest userspace or an L2
> > > guest."
> >
> > sure - will use this - and remove my confusing attempt to
> > credit to your override patch and VFIO :)
> >
> > > (I avoided including SR-IOV with multifunction since that's not
> > > implemented here)
> >
> > I agree
> >
> > > > Signed-off-by: Knut Omang <address@hidden>
> > > > ---
> > > > hw/pci/pcie.c | 39
> > > > +++++++++++++++++++++++++++++++++++++++-
> > > > include/hw/pci/pcie.h | 6 ++++++-
> > > > include/hw/pci/pcie_regs.h | 4 ++++-
> > > > 3 files changed, 49 insertions(+)
> > > >
> > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > index 230478f..6e87994 100644
> > > > --- a/hw/pci/pcie.c
> > > > +++ b/hw/pci/pcie.c
> > > > @@ -906,3 +906,42 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset)
> > > >
> > > > pci_set_word(dev->wmask + dev->exp.ats_cap + PCI_ATS_CTRL, 0x800f);
> > > > }
> > > > +
> > > > +/* ACS (Access Control Services) */
> > > > +void pcie_acs_init(PCIDevice *dev, uint16_t offset)
> > > > +{
> > > > + bool is_pcie_slot = !!object_dynamic_cast(OBJECT(dev),
> > > > TYPE_PCIE_SLOT);
> > >
> > > Perhaps we should be using pci_is_express_downstream_port().
> >
> > oh - yes - I forgot that we need to look in pci.h for those kind of
> > helpers..
> >
> > > > + uint16_t cap_bits = 0;
> > > > +
> > > > + /*
> > > > + * For endpoints, only multifunction devices may have an
> > > > + * ACS capability, and only on function 0:
> > >
> > > Incorrect
> > >
> > > > + */
> > > > + assert(is_pcie_slot ||
> > > > + ((dev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) &&
> > > > + PCI_FUNC(dev->devfn)));
> > >
> > > The second test should be:
> > >
> > > ((dev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) ||
> > > PCI_FUNC(dev->devfn))
> > >
> > > If the function number is non-zero, then it's clearly a multifunction
> > > device, the multifunction capability is only required on function
> > > zero. Just as in my previous example, an ACS capability can only
> > > describe/control the DMA flow of the function implementing it, nothing
> > > in the spec that I can see imposes function zero's DMA flow on the
> > > other functions.
> >
> > Ah - of course - that makes sense -
> > was thinking too complicated here, and also my comment didn't match
> > the code at all..
> >
> > > There's also a gap here that function zero can set the multifunction
> > > capability, but there may be no secondary devices defined. Not that
> > > we necessarily need to resolve this, but it's a nuance of allowing
> > > arbitrary multifunction configurations as QEMU does.
> >
> > Yes, in the SR/IOV case, at least as I have implemented it in QEMU,
> > with one PF that would be the default - as no VFs are defined at reset,
> > there's only one function, but it still need to be multifunction
> > for QEMU to accept more functions appearing later.
> >
> > > > +
> > > > + pcie_add_capability(dev, PCI_EXT_CAP_ID_ACS, PCI_ACS_VER, offset,
> > > > + PCI_ACS_SIZEOF);
> > > > + dev->exp.acs_cap = offset;
> > > > +
> > > > + if (is_pcie_slot) {
> > > > + /*
> > > > + * Endpoints may also implement ACS, and optionally RR and CR,
> > > > + * if they want to support p2p, but only slots may
> > > > + * implement SV, TB or UF:
> > >
> > > Careful using "may" with spec references.
> >
> > :-D
> >
> > > "Downstream ports must implement SV, TB, RR, CR, and UF (with caveats
> > > on the latter three that we ignore for simplicity). Endpoints may also
> > > implement a subset of ACS capabilities, but these are optional if the
> > > endpoint does not support peer-to-peer between functions and thus
> > > omitted here."
> >
> > Thanks - I'll put that in instead
> >
> > > > + */
> > > > + cap_bits =
> > > > + PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR |
> > > > PCI_ACS_UF;
> > >
> > > PCI_ACS_DT has similar "must be implemented" requirements to RR, RC,
> > > and UF, why is it not included? For all of these "caveat" ones there
> > > are conditions which can make it optional for root ports, but required
> > > for switch downstream ports, so it seems reasonable that we include
> > > both since that's what our is_pci_slot() test covers. Thanks,
> >
> > That was because my "reference" root ports don't not implement it, taking
> > the
> > conservative approach:
> >
> > 00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root
> > Port 7 (rev 22) (prog-if 00 [Normal decode])
> > ...
> > Capabilities: [150 v1] Access Control Services
> > ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+
> > UpstreamFwd+ EgressCtrl- DirectTrans-
> > ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+
> > UpstreamFwd+ EgressCtrl- DirectTrans-
> >
> > In fact, all gens of servers I have access to supports just the same cap
> > bits in
> > their root ports, in order of release date. The latest gen even have some
> > root
> > ports without an ACS capability.
> >
> > 00:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root
> > Port 2a (rev 07)
> > 00:01.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon
> > D PCI Express Root Port 1 (rev 01) (prog-if 00 [Normal decode])
> > 00:03.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI
> > Express Root Port 3 (rev 02) (prog-if 00 [Normal decode])
> > 00:02.0 PCI bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI
> > Express Root Port 2a (rev 04) (prog-if 00 [Normal decode])
> > 17:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A
> > (rev 04) (prog-if 00 [Normal decode])
> >
> > All of these have DirectTrans- in their ACSCap.
> >
> > [For reference, the one without ACS cap, in the same server as 17:00.0
> > above:
> >
> > 00:1c.0 PCI bridge: Intel Corporation Lewisburg PCI Express Root Port #1
> > (rev f9) (prog-if 00 [Normal decode])
> > ]
> >
> > Do you prefer that we set DT as default anyway, or should we stay within
> > "known
> > territory" for the OSes, at least for now?
>
> Per the spec:
>
> ACS Direct Translated P2P: must be implemented by Root Ports that
> support Address Translation Services (ATS) and also support
> peer-to-peer traffic with other Root Ports; must be implemented by
> Switch Downstream Ports.
>
> The caveats for root ports here are why your hardware is potentially
> spec compliant without supporting DT. There are no caveats for switch
> downstream ports, therefore it would not be spec compliant to exclude
> it. I think your options are either to exclude switch downstream ports
> from the function or to set DT. Thanks,
Better to set DT then - a future generic switch downstream port would want to
have ACS too.
Knut
> Alex