[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] vfio/pci: Improve extended capability comments,
From: |
Peter Xu |
Subject: |
Re: [Qemu-devel] [PATCH] vfio/pci: Improve extended capability comments, skip masked caps |
Date: |
Wed, 22 Feb 2017 11:08:51 +0800 |
User-agent: |
Mutt/1.5.24 (2015-08-30) |
[cc Jintack]
On Tue, Feb 21, 2017 at 02:43:03PM -0700, Alex Williamson wrote:
> Since commit 4bb571d857d9 ("pci/pcie: don't assume cap id 0 is
> reserved") removes the internal use of extended capability ID 0, the
> comment here becomes invalid. However, peeling back the onion, the
> code is still correct and we still can't seed the capability chain
> with ID 0, unless we want to muck with using the version number to
> force the header to be non-zero, which is much uglier to deal with.
> The comment also now covers some of the subtleties of using cap ID 0,
> such as transparently indicating absence of capabilities if none are
> added. This doesn't detract from the correctness of the referenced
> commit as vfio in the kernel also uses capability ID zero to mask
> capabilties. In fact, we should skip zero capabilities precisely
> because the kernel might also expose such a capability at the head
> position and re-introduce the problem.
>
> Signed-off-by: Alex Williamson <address@hidden>
> Cc: Peter Xu <address@hidden>
> Cc: Michael S. Tsirkin <address@hidden>
> ---
> hw/vfio/pci.c | 31 +++++++++++++++++++++----------
> 1 file changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index f2ba9b6cfafc..03a3d0154976 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1880,16 +1880,26 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
> /*
> * Extended capabilities are chained with each pointing to the next, so
> we
> * can drop anything other than the head of the chain simply by modifying
> - * the previous next pointer. For the head of the chain, we can modify
> the
> - * capability ID to something that cannot match a valid capability. ID
> - * 0 is reserved for this since absence of capabilities is indicated by
> - * 0 for the ID, version, AND next pointer. However,
> pcie_add_capability()
> - * uses ID 0 as reserved for list management and will incorrectly match
> and
> - * assert if we attempt to pre-load the head of the chain with this ID.
> - * Use ID 0xFFFF temporarily since it is also seems to be reserved in
> - * part for identifying absence of capabilities in a root complex
> register
> - * block. If the ID still exists after adding capabilities, switch back
> to
> - * zero. We'll mark this entire first dword as emulated for this
> purpose.
> + * the previous next pointer. Seed the head of the chain here such that
> + * we can simply skip any capabilities we want to drop below, regardless
> + * of their position in the chain. If this stub capability still exists
> + * after we add the capabilities we want to expose, update the capability
> + * ID to zero. Note that we cannot seed with the capability header being
> + * zero as this conflicts with definition of an absent capability chain
> + * and prevents capabilities beyond the head of the list from being
> added.
> + * By replacing the dummy capability ID with zero after walking the
> device
> + * chain, we also transparently mark extended capabilities as absent if
> + * no capabilities were added. Note that the PCIe spec defines an
> absence
> + * of extended capabilities to be determined by a value of zero for the
> + * capability ID, version, AND next pointer. A non-zero next pointer
> + * should be sufficient to indicate additional capabilities are present,
> + * which will occur if we call pcie_add_capability() below. The entire
> + * first dword is emulated to support this.
> + *
> + * NB. The kernel side does similar masking, so be prepared that our
> + * view of the device may also contain a capability ID zero in the head
> + * of the chain. Skip it for the same reason that we cannot seed the
> + * chain with a zero capability.
> */
> pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE,
> PCI_EXT_CAP(0xFFFF, 0, 0));
> @@ -1915,6 +1925,7 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
> PCI_EXT_CAP_NEXT_MASK);
>
> switch (cap_id) {
> + case 0: /* kernel masked capability */
> case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
> case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */
> trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id,
> next);
>
Reviewed-by: Peter Xu <address@hidden>
Since this bug is originally reported by Jintack, maybe we can also
add:
Reported-by: Jintack Lim <address@hidden>
Jintack, if you want to test it and provide your tested-by, it would
be nice as well. ;)
Actually I just found that the bug still exist after Michael's fix (I
thought it was fixed). So we definitely need this patch or equivalent.
However, I would still slightly prefer removing the wrapping hack
since after all we need to touch it (and I do feel like that's error
prone...). So, Alex, do you like below one instead?
--------8<----------
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 332f41d..6942c1d 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1877,25 +1877,6 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
*/
config = g_memdup(pdev->config, vdev->config_size);
- /*
- * Extended capabilities are chained with each pointing to the next, so we
- * can drop anything other than the head of the chain simply by modifying
- * the previous next pointer. For the head of the chain, we can modify the
- * capability ID to something that cannot match a valid capability. ID
- * 0 is reserved for this since absence of capabilities is indicated by
- * 0 for the ID, version, AND next pointer. However, pcie_add_capability()
- * uses ID 0 as reserved for list management and will incorrectly match and
- * assert if we attempt to pre-load the head of the chain with this ID.
- * Use ID 0xFFFF temporarily since it is also seems to be reserved in
- * part for identifying absence of capabilities in a root complex register
- * block. If the ID still exists after adding capabilities, switch back to
- * zero. We'll mark this entire first dword as emulated for this purpose.
- */
- pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE,
- PCI_EXT_CAP(0xFFFF, 0, 0));
- pci_set_long(pdev->wmask + PCI_CONFIG_SPACE_SIZE, 0);
- pci_set_long(vdev->emulated_config_bits + PCI_CONFIG_SPACE_SIZE, ~0);
-
for (next = PCI_CONFIG_SPACE_SIZE; next;
next = PCI_EXT_CAP_NEXT(pci_get_long(config + next))) {
header = pci_get_long(config + next);
@@ -1910,24 +1891,33 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
*/
size = vfio_ext_cap_max_size(config, next);
- /* Use emulated next pointer to allow dropping extended caps */
- pci_long_test_and_set_mask(vdev->emulated_config_bits + next,
- PCI_EXT_CAP_NEXT_MASK);
+ /* Use emulated header to allow dropping extended caps */
+ pci_set_long(vdev->emulated_config_bits + next, 0xffffffffULL);
switch (cap_id) {
case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */
+ case PCI_EXT_CAP_ID_VC:
+ /*
+ * For dropped capabilities, we keep their slot but
+ * replace them with a header containing cap_id=0 &&
+ * cap_ver=1. We do this reservation mostly to make sure
+ * the head ecap (at offset 0x100) will always be there.
+ * Anyway it won't hurt if we keep the rest of the dropped
+ * ones as well.
+ *
+ * Here we use non-zero cap_ver because we want to "mark"
+ * this ecap as "available" - from PCIe spec (chap 7.9.1),
+ * it marked out that cap_id=cap_ver=next=0 means empty
+ * ecap, and we really don't want it to be an "empty" slot
+ * especially for the head ecap (we need a head, always!).
+ */
+ pcie_add_capability(pdev, 0, 1, next, size);
trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next);
break;
default:
pcie_add_capability(pdev, cap_id, cap_ver, next, size);
}
-
- }
-
- /* Cleanup chain head ID if necessary */
- if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0xFFFF) {
- pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
}
g_free(config);
------->8--------
The new patch will keep all the dropped ecaps (so we may see more than
one cap_id=0x0000 field), which I don't know whether would be a
drawback. OTOH, it provides benefits like:
- we can remove the wrapping hack, so the code is much readable and
less error prone imho
- we can avoid using the assumption that 0xffff cap_id is reserved
I can live with both patches though. :-)
Thanks!
-- peterx