qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC/RFT PATCH 5/5] vfio/pci: Allow relocating MSI-X MM


From: Alex Williamson
Subject: Re: [Qemu-devel] [RFC/RFT PATCH 5/5] vfio/pci: Allow relocating MSI-X MMIO
Date: Mon, 18 Dec 2017 06:28:22 -0700

On Mon, 18 Dec 2017 20:04:23 +1100
Alexey Kardashevskiy <address@hidden> wrote:

> On 18/12/17 16:02, Alex Williamson wrote:
> > With recently proposed kernel side vfio-pci changes, the MSI-X vector
> > table area can be mmap'd from userspace, allowing direct access to
> > non-MSI-X registers within the host page size of this area.  However,
> > we only get that direct access if QEMU isn't also emulating MSI-X
> > within that same page.  For x86/64 host, the system page size is 4K
> > and the PCI spec recommends a minimum of 4K to 8K alignment to
> > separate MSI-X from non-MSI-X registers, therefore only devices which
> > don't honor this recommendation would see any improvement from this
> > option.  The real targets for this feature are hosts where the page
> > size exceeds the PCI spec recommended alignment, such as ARM64 systems
> > with 64K pages.
> > 
> > This new x-msix-relocation option accepts the following options:
> > 
> >   off: Disable MSI-X relocation, use native device config (default)
> >   auto: Automaically relocate MSI-X MMIO to another BAR or offset
> >        based on minimum additional MMIO requirement
> >   bar0..bar5: Specify the target BAR, which will either be extended
> >        if the BAR exists or added if the BAR slot is available.  
> 
> 
> While I am digesting the patchset, here are some test results.

Thanks for testing!

> This is the device:
> 
> 00:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008
> PCI-Express Fusion-MPT SAS-3 (rev 02)

BAR1:

> Memory at 210000000000 (64-bit, non-prefetchable) [size=64K]

BAR3:

> Memory at 210000040000 (64-bit, non-prefetchable) [size=256K]
> 
> Capabilities: [c0] MSI-X: Enable+ Count=96 Masked-
>         Vector table: BAR=1 offset=0000e000
>         PBA: BAR=1 offset=0000f000
> 
> 
> Test #1: x-msix-relocation = "off":
> 
> FlatView #1
>  AS "memory", root: system
>  AS "cpu-memory", root: system
>  Root memory region: system
>   0000000000000000-000000007fffffff (prio 0, ram): ppc_spapr.ram
>   0000210000000000-000021000000dfff (prio 0, i/o): 0001:03:00.0 BAR 1
>   000021000000e000-000021000000e5ff (prio 0, i/o): msix-table
>   000021000000e600-000021000000ffff (prio 0, i/o): 0001:03:00.0 BAR 1
> @000000000000e600
>   0000210000040000-000021000007ffff (prio 0, ramd): 0001:03:00.0 BAR 3 
> mmaps[0]
> 
> Ok, works.
> 
> 
> Test #2: x-msix-relocation = "auto":
> 
> FlatView #2
>  AS "memory", root: system
>  AS "cpu-memory", root: system
>  Root memory region: system
>   0000000000000000-000000007fffffff (prio 0, ram): ppc_spapr.ram
>   0000200080000000-00002000800005ff (prio 0, i/o): msix-table
>   0000200080000600-000020008000ffff (prio 1, i/o): 0001:03:00.0 base BAR 0
> @0000000000000600
>   0000210000000000-000021000000ffff (prio 0, i/o): 0001:03:00.0 BAR 1
>   0000210000040000-000021000007ffff (prio 0, ramd): 0001:03:00.0 BAR 3 
> mmaps[0]
> 
> 
> The guest fails probing because the first 64bit BAR is broken.
> 
> lspci:
> 
> Region 0: Memory at 200080000000 (32-bit, prefetchable) [size=64K]
> Region 1: Memory at 210000000000 (64-bit, non-prefetchable) [size=64K]
> Region 3: Memory at 210000040000 (64-bit, non-prefetchable) [size=256K]
> 
> Capabilities: [c0] MSI-X: Enable- Count=96 Masked-
>         Vector table: BAR=0 offset=00000000
>         PBA: BAR=0 offset=00000600

Why do you suppose it's broken?  The added BAR0 is 32bit, it cannot be
64bit since BAR1 is implemented.  I don't see anything fundamentally
different between this and the working BAR5 test below.

> Test #3: x-msix-relocation = "bar1"
> 
> 
> FlatView #1
>  AS "memory", root: system
>  AS "cpu-memory", root: system
>  Root memory region: system
>   0000000000000000-000000007fffffff (prio 0, ram): ppc_spapr.ram
>   0000210000000000-000021000000ffff (prio 0, i/o): 0001:03:00.0 BAR 1
>   0000210000010000-00002100000105ff (prio 0, i/o): msix-table
>   0000210000010600-000021000001ffff (prio 1, i/o): 0001:03:00.0 base BAR 1
> @0000000000010600
>   0000210000040000-000021000007ffff (prio 0, ramd): 0001:03:00.0 BAR 3 
> mmaps[0]
> 
> Ok, works. BAR1 became 128K. However no part of BAR1 was mapped, i.e.
> appear as "ramd" in flatview, should it have appeared?
> 
> This is "mtree":
> 
> memory-region: address@hidden
>   0000000000000000-ffffffffffffffff (prio 0, i/o): address@hidden
>     0000210000000000-000021000001ffff (prio 1, i/o): 0001:03:00.0 base BAR 1
>       0000210000000000-000021000000ffff (prio 0, i/o): 0001:03:00.0 BAR 1
>       0000210000010000-00002100000105ff (prio 0, i/o): msix-table
>       0000210000010600-000021000001060f (prio 0, i/o): msix-pba [disabled]
>     0000210000040000-000021000007ffff (prio 1, i/o): 0001:03:00.0 base BAR 3
>       0000210000040000-000021000007ffff (prio 0, i/o): 0001:03:00.0 BAR 3
>         0000210000040000-000021000007ffff (prio 0, ramd): 0001:03:00.0 BAR
> 3 mmaps[0]

Did you disable vfio_pci_fixup_msix_region() as noted in 0/5?  This
series doesn't do anything about consuming the new MSI-X mappable flag
that you introduced in the kernel, so vfio_pci_fixup_msix_region() will
continue to exclude mmap'ing the 64K page overlapping the actual BAR.

> Test #4: x-msix-relocation = "bar5"
> 
> The same net result as test #3: it works but BAR1 is not mapped:
> 
> 
> Region 1: Memory at 210000000000 (64-bit, non-prefetchable) [size=64K]
> Region 3: Memory at 210000040000 (64-bit, non-prefetchable) [size=256K]
> Region 5: Memory at 200080000000 (32-bit, prefetchable) [size=64K]
> 
> Capabilities: [c0] MSI-X: Enable+ Count=96 Masked-
>         Vector table: BAR=5 offset=00000000
>         PBA: BAR=5 offset=00000600
> 
> FlatView #0
>  AS "memory", root: system
>  AS "cpu-memory", root: system
>  Root memory region: system
>   0000000000000000-000000007fffffff (prio 0, ram): ppc_spapr.ram
>   0000200080000000-00002000800005ff (prio 0, i/o): msix-table
>   0000200080000600-000020008000ffff (prio 1, i/o): 0001:03:00.0 base BAR 5
> @0000000000000600
>   0000210000000000-000021000000ffff (prio 0, i/o): 0001:03:00.0 BAR 1
>   0000210000040000-000021000007ffff (prio 0, ramd): 0001:03:00.0 BAR 3 
> mmaps[0]
> 
> 
> memory-region: address@hidden
>   0000000000000000-ffffffffffffffff (prio 0, i/o): address@hidden
>     0000000080000000-000000008000ffff (prio 1, i/o): 0001:03:00.0 base BAR 5
>       0000000080000000-00000000800005ff (prio 0, i/o): msix-table
>       0000000080000600-000000008000060f (prio 0, i/o): msix-pba [disabled]
>     0000210000000000-000021000000ffff (prio 1, i/o): 0001:03:00.0 base BAR 1
>       0000210000000000-000021000000ffff (prio 0, i/o): 0001:03:00.0 BAR 1
>     0000210000040000-000021000007ffff (prio 1, i/o): 0001:03:00.0 base BAR 3
>       0000210000040000-000021000007ffff (prio 0, i/o): 0001:03:00.0 BAR 3
>         0000210000040000-000021000007ffff (prio 0, ramd): 0001:03:00.0 BAR
> 3 mmaps[0]

As above, you won't get the mmap without disabling the implicit page
exclusion.  The real question for this case is why does it work while
'auto' came up with a nearly identical layout, swapping BAR5 for BAR0
and it did not work.  The placement of the BARs is even the same.

> and there is also one minor comment below.
> 
> 
> > @@ -2845,13 +2942,14 @@ static void vfio_realize(PCIDevice *pdev, Error 
> > **errp)
> >  
> >      vfio_pci_size_rom(vdev);
> >  
> > +    vfio_bars_prepare(vdev);
> > +
> >      vfio_msix_early_setup(vdev, &err);
> >      if (err) {
> >          error_propagate(errp, err);
> >          goto error;
> >      }
> >  
> > -    vfio_bars_prepare(vdev);  
> 
> 
> This could be in 2/5.

It could, but 2/5 was attempting to add the base BAR MemoryRegion and
split vfio_bars_setup() into vfio_bars_prepare() and
vfio_bars_register() without otherwise changing the ordering.  It's
only when we want to modify BARs between prepare and register that we
need to make this change, thus it's done here.  Thanks,

Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]