[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] iommu emulation
From: |
Jintack Lim |
Subject: |
Re: [Qemu-devel] iommu emulation |
Date: |
Tue, 21 Feb 2017 05:33:00 -0500 |
On Wed, Feb 15, 2017 at 9:47 PM, Alex Williamson <address@hidden
> wrote:
> On Thu, 16 Feb 2017 10:28:39 +0800
> Peter Xu <address@hidden> wrote:
>
> > On Wed, Feb 15, 2017 at 11:15:52AM -0700, Alex Williamson wrote:
> >
> > [...]
> >
> > > > Alex, do you like something like below to fix above issue that
> Jintack
> > > > has encountered?
> > > >
> > > > (note: this code is not for compile, only trying show what I mean...)
> > > >
> > > > ------8<-------
> > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > > index 332f41d..4dca631 100644
> > > > --- a/hw/vfio/pci.c
> > > > +++ b/hw/vfio/pci.c
> > > > @@ -1877,25 +1877,6 @@ static void vfio_add_ext_cap(VFIOPCIDevice
> *vdev)
> > > > */
> > > > config = g_memdup(pdev->config, vdev->config_size);
> > > >
> > > > - /*
> > > > - * Extended capabilities are chained with each pointing to the
> next, so we
> > > > - * can drop anything other than the head of the chain simply by
> modifying
> > > > - * the previous next pointer. For the head of the chain, we
> can modify the
> > > > - * capability ID to something that cannot match a valid
> capability. ID
> > > > - * 0 is reserved for this since absence of capabilities is
> indicated by
> > > > - * 0 for the ID, version, AND next pointer. However,
> pcie_add_capability()
> > > > - * uses ID 0 as reserved for list management and will
> incorrectly match and
> > > > - * assert if we attempt to pre-load the head of the chain with
> this ID.
> > > > - * Use ID 0xFFFF temporarily since it is also seems to be
> reserved in
> > > > - * part for identifying absence of capabilities in a root
> complex register
> > > > - * block. If the ID still exists after adding capabilities,
> switch back to
> > > > - * zero. We'll mark this entire first dword as emulated for
> this purpose.
> > > > - */
> > > > - pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE,
> > > > - PCI_EXT_CAP(0xFFFF, 0, 0));
> > > > - pci_set_long(pdev->wmask + PCI_CONFIG_SPACE_SIZE, 0);
> > > > - pci_set_long(vdev->emulated_config_bits +
> PCI_CONFIG_SPACE_SIZE, ~0);
> > > > -
> > > > for (next = PCI_CONFIG_SPACE_SIZE; next;
> > > > next = PCI_EXT_CAP_NEXT(pci_get_long(config + next))) {
> > > > header = pci_get_long(config + next);
> > > > @@ -1917,6 +1898,8 @@ static void vfio_add_ext_cap(VFIOPCIDevice
> *vdev)
> > > > switch (cap_id) {
> > > > case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse
> OVMF */
> > > > case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function
> virtualization */
> > > > + /* keep this ecap header (4 bytes), but mask cap_id to
> 0xffff */
> > > > + ...
> > > > trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name,
> cap_id, next);
> > > > break;
> > > > default:
> > > > @@ -1925,11 +1908,6 @@ static void vfio_add_ext_cap(VFIOPCIDevice
> *vdev)
> > > >
> > > > }
> > > >
> > > > - /* Cleanup chain head ID if necessary */
> > > > - if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) ==
> 0xFFFF) {
> > > > - pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
> > > > - }
> > > > -
> > > > g_free(config);
> > > > return;
> > > > }
> > > > ----->8-----
> > > >
> > > > Since after all we need the assumption that 0xffff is reserved for
> > > > cap_id. Then, we can just remove the "first 0xffff then 0x0" hack,
> > > > which is imho error-prone and hacky.
> > >
> > > This doesn't fix the bug, which is that pcie_add_capability() uses a
> > > valid capability ID for it's own internal tracking. It's only doing
> > > this to find the end of the capability chain, which we could do in a
> > > spec complaint way by looking for a zero next pointer. Fix that and
> > > then vfio doesn't need to do this set to 0xffff then back to zero
> > > nonsense at all. Capability ID zero is valid. Thanks,
> >
> > Yeah I see Michael's fix on the capability list stuff. However, imho
> > these are two different issues? Or say, even if with that patch, we
> > should still need this hack (first 0x0, then 0xffff) right? Since
> > looks like that patch didn't solve the problem if the first pcie ecap
> > is masked at 0x100.
>
> I thought the problem was that QEMU in the host exposes a device with a
> capability ID of 0 to the L1 guest. QEMU in the L1 guest balks at a
> capability ID of 0 because that's how it finds the end of the chain.
> Therefore if we make QEMU not use capability ID 0 for internal
> purposes, things work. vfio using 0xffff and swapping back to 0x0
> becomes unnecessary, but doesn't hurt anything. Thanks,
>
I've applied Peter's hack and Michael's patch below, but still can't use
the assigned device in L2.
commit 4bb571d857d973d9308d9fdb1f48d983d6639bd4
Author: Michael S. Tsirkin <address@hidden>
Date: Wed Feb 15 22:37:45 2017 +0200
pci/pcie: don't assume cap id 0 is reserved
I was able to boot L2 with following qemu warnings,
qemu-system-x86_64: vfio: Cannot reset device 0000:00:03.0, no available
reset mechanism.
qemu-system-x86_64: vfio: Cannot reset device 0000:00:03.0, no available
reset mechanism.
but then I don't see the network device, which I was trying to assign to
L2, in L2.
This is from L2 dmesg, and it looks like the device is not initialized.
[ 5.884115] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
[ 5.891563] mlx4_core: Initializing 0000:00:03.0
[ 5.896947] ACPI: PCI Interrupt Link [GSIH] enabled at IRQ 23
[ 6.913559] mlx4_core 0000:00:03.0: Installed FW has unsupported command
interface revision 0
[ 6.920925] mlx4_core 0000:00:03.0: (Installed FW version is 0.0.000)
[ 6.926490] mlx4_core 0000:00:03.0: This driver version supports only
revisions 2 to 3
[ 6.933300] mlx4_core 0000:00:03.0: QUERY_FW command failed, aborting
[ 6.940279] mlx4_core 0000:00:03.0: Failed to init fw, aborting.
This is the full kernel log from L2.
https://paste.ubuntu.com/24039462/
L0, L1 and L2 are using the same kernel, so I think they are using the same
device driver.
This is the L0/L1 kernel log about the network device.
--- From L0 ---
[ 8.175533] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
[ 8.175543] mlx4_core: Initializing 0000:08:00.0
[ 14.524093] mlx4_core 0000:08:00.0: PCIe link speed is 8.0GT/s, device
supports 8.0GT/s
[ 14.533030] mlx4_core 0000:08:00.0: PCIe link width is x8, device
supports x8
[ 14.714296] mlx4_en: Mellanox ConnectX HCA Ethernet driver v2.2-1 (Feb
2014)
[ 14.722295] mlx4_en 0000:08:00.0: Activating port:2
[ 14.735186] mlx4_en: 0000:08:00.0: Port 2: Using 128 TX rings
[ 14.741608] mlx4_en: 0000:08:00.0: Port 2: Using 8 RX rings
[ 14.747826] mlx4_en: 0000:08:00.0: Port 2: frag:0 - size:1522 prefix:0
stride:1536
[ 14.756698] mlx4_en: 0000:08:00.0: Port 2: Initializing port
[ 14.764036] mlx4_en 0000:08:00.0: registered PHC clock
--- From L1 ---
[ 3.790302] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
[ 3.791089] mlx4_core: Initializing 0000:00:03.0
[ 9.053077] mlx4_core 0000:00:03.0: Unable to determine PCIe device BW
capabilities
[ 9.203290] mlx4_en: Mellanox ConnectX HCA Ethernet driver v2.2-1 (Feb
2014)
[ 9.204503] mlx4_en 0000:00:03.0: Activating port:2
[ 9.212853] mlx4_en: 0000:00:03.0: Port 2: Using 32 TX rings
[ 9.213514] mlx4_en: 0000:00:03.0: Port 2: Using 4 RX rings
[ 9.214131] mlx4_en: 0000:00:03.0: Port 2: frag:0 - size:1522 prefix:0
stride:1536
[ 9.215260] mlx4_en: 0000:00:03.0: Port 2: Initializing port
[ 9.216377] mlx4_en 0000:00:03.0: registered PHC clock
[ 9.261518] mlx4_en: eth1: Link Up
[ 9.690730] mlx4_core 0000:00:03.0 eth2: renamed from eth1
Any thoughts?
> Alex
>
>
- Re: [Qemu-devel] iommu emulation, Peter Xu, 2017/02/08
- Re: [Qemu-devel] iommu emulation, Jintack Lim, 2017/02/09
- Re: [Qemu-devel] iommu emulation, Peter Xu, 2017/02/14
- Re: [Qemu-devel] iommu emulation, Jintack Lim, 2017/02/14
- Re: [Qemu-devel] iommu emulation, Peter Xu, 2017/02/14
- Re: [Qemu-devel] iommu emulation, Peter Xu, 2017/02/14
- Re: [Qemu-devel] iommu emulation, Alex Williamson, 2017/02/15
- Re: [Qemu-devel] iommu emulation, Peter Xu, 2017/02/15
- Re: [Qemu-devel] iommu emulation, Alex Williamson, 2017/02/15
- Re: [Qemu-devel] iommu emulation,
Jintack Lim <=
- Re: [Qemu-devel] iommu emulation, Jintack Lim, 2017/02/23
- Re: [Qemu-devel] iommu emulation, Jintack Lim, 2017/02/15
- Re: [Qemu-devel] iommu emulation, Alex Williamson, 2017/02/15
- Re: [Qemu-devel] iommu emulation, Jintack Lim, 2017/02/15
- Re: [Qemu-devel] iommu emulation, Alex Williamson, 2017/02/15