[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2] vfio/pci: Static Resizable BAR capability
|
From: |
Alex Williamson |
|
Subject: |
Re: [PATCH v2] vfio/pci: Static Resizable BAR capability |
|
Date: |
Tue, 9 May 2023 08:20:19 -0600 |
On Tue, 9 May 2023 11:39:57 +0200
Cédric Le Goater <clg@redhat.com> wrote:
> On 5/6/23 01:23, Alex Williamson wrote:
> > The PCI Resizable BAR (ReBAR) capability is currently hidden from the
> > VM because the protocol for interacting with the capability does not
> > support a mechanism for the device to reject an advertised supported
> > BAR size. However, when assigned to a VM, the act of resizing the
> > BAR requires adjustment of host resources for the device, which
> > absolutely can fail. Linux does not currently allow us to reserve
> > resources for the device independent of the current usage.
> >
> > The only writable field within the ReBAR capability is the BAR Size
> > register. The PCIe spec indicates that when written, the device
> > should immediately begin to operate with the provided BAR size. The
> > spec however also notes that software must only write values
> > corresponding to supported sizes as indicated in the capability and
> > control registers. Writing unsupported sizes produces undefined
> > results. Therefore, if the hypervisor were to virtualize the
> > capability and control registers such that the current size is the
> > only indicated available size, then a write of anything other than
> > the current size falls into the category of undefined behavior,
> > where we can essentially expose the modified ReBAR capability as
> > read-only.
> >
> > This may seem pointless, but users have reported that virtualizing
> > the capability in this way not only allows guest software to expose
> > related features as available (even if only cosmetic), but in some
> > scenarios can resolve guest driver issues. Additionally, no
> > regressions in behavior have been reported for this change.
> >
> > A caveat here is that the PCIe spec requires for compatibility that
> > devices report support for a size in the range of 1MB to 512GB,
> > therefore if the current BAR size falls outside that range we revert
> > to hiding the capability.
> >
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
>
> > ---
> > v2:
> > - Add spec reference
> > - Use PCI_REBAR_CAP_SIZES to check sizes in range
> > - Try to clarify capability bit generation
> > - Rename s/bars/nbar/ to match #defines
> > - More complete masking of NBAR value
> >
> > hw/vfio/pci.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 53 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index cf27f28936cb..3ab849767a92 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -2066,6 +2066,54 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev,
> > uint8_t pos, Error **errp)
> > return 0;
> > }
> >
> > +static int vfio_setup_rebar_ecap(VFIOPCIDevice *vdev, uint16_t pos)
> > +{
> > + uint32_t ctrl;
> > + int i, nbar;
> > +
> > + ctrl = pci_get_long(vdev->pdev.config + pos + PCI_REBAR_CTRL);
> > + nbar = (ctrl & PCI_REBAR_CTRL_NBAR_MASK) >> PCI_REBAR_CTRL_NBAR_SHIFT;
> > +
> > + for (i = 0; i < nbar; i++) {
> > + uint32_t cap;
> > + int size;
> > +
> > + ctrl = pci_get_long(vdev->pdev.config + pos + PCI_REBAR_CTRL + (i
> > * 8));
> > + size = (ctrl & PCI_REBAR_CTRL_BAR_SIZE) >>
> > PCI_REBAR_CTRL_BAR_SHIFT;
> > +
> > + /* The cap register reports sizes 1MB to 127TB, with 4 reserved
> > bits */
>
> s/127/128/
Yes, fixed. Thanks!
Alex
> > + cap = size <= 27 ? 1U << (size + 4) : 0;
> > +
> > + /*
> > + * The PCIe spec (v6.0.1, 7.8.6) requires HW to support at least
> > one
> > + * size in the range 1MB to 512GB. We intend to mask all sizes
> > except
> > + * the one currently enabled in the size field, therefore if it's
> > + * outside the range, hide the whole capability as this
> > virtualization
> > + * trick won't work. If >512GB resizable BARs start to appear, we
> > + * might need an opt-in or reservation scheme in the kernel.
> > + */
> > + if (!(cap & PCI_REBAR_CAP_SIZES)) {
> > + return -EINVAL;
> > + }
> > +
> > + /* Hide all sizes reported in the ctrl reg per above requirement.
> > */
> > + ctrl &= (PCI_REBAR_CTRL_BAR_SIZE |
> > + PCI_REBAR_CTRL_NBAR_MASK |
> > + PCI_REBAR_CTRL_BAR_IDX);
> > +
> > + /*
> > + * The BAR size field is RW, however we've mangled the capability
> > + * register such that we only report a single size, ie. the current
> > + * BAR size. A write of an unsupported value is undefined,
> > therefore
> > + * the register field is essentially RO.
> > + */
> > + vfio_add_emulated_long(vdev, pos + PCI_REBAR_CAP + (i * 8), cap,
> > ~0);
> > + vfio_add_emulated_long(vdev, pos + PCI_REBAR_CTRL + (i * 8), ctrl,
> > ~0);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
> > {
> > PCIDevice *pdev = &vdev->pdev;
> > @@ -2139,9 +2187,13 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
> > case 0: /* kernel masked capability */
> > case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
> > case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function
> > virtualization */
> > - case PCI_EXT_CAP_ID_REBAR: /* Can't expose read-only */
> > trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id,
> > next);
> > break;
> > + case PCI_EXT_CAP_ID_REBAR:
> > + if (!vfio_setup_rebar_ecap(vdev, next)) {
> > + pcie_add_capability(pdev, cap_id, cap_ver, next, size);
> > + }
> > + break;
> > default:
> > pcie_add_capability(pdev, cap_id, cap_ver, next, size);
> > }
>