qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] vfio failure with intel 760p 128GB nvme


From: Alex Williamson
Subject: Re: [Qemu-devel] vfio failure with intel 760p 128GB nvme
Date: Thu, 27 Dec 2018 07:20:13 -0700

On Thu, 27 Dec 2018 20:30:48 +0800
Dongli Zhang <address@hidden> wrote:

> Hi Alex,
> 
> On 12/02/2018 09:29 AM, Dongli Zhang wrote:
> > Hi Alex,
> > 
> > On 12/02/2018 03:29 AM, Alex Williamson wrote:  
> >> On Sat, 1 Dec 2018 10:52:21 -0800 (PST)
> >> Dongli Zhang <address@hidden> wrote:
> >>  
> >>> Hi,
> >>>
> >>> I obtained below error when assigning an intel 760p 128GB nvme to guest 
> >>> via
> >>> vfio on my desktop:
> >>>
> >>> qemu-system-x86_64: -device vfio-pci,host=0000:01:00.0: vfio 
> >>> 0000:01:00.0: failed to add PCI capability address@hidden: table & pba 
> >>> overlap, or they don't fit in BARs, or don't align
> >>>
> >>>
> >>> This is because the msix table is overlapping with pba. According to below
> >>> 'lspci -vv' from host, the distance between msix table offset and pba 
> >>> offset is
> >>> only 0x100, although there are 22 entries supported (22 entries need 
> >>> 0x160).
> >>> Looks qemu supports at most 0x800.
> >>>
> >>> # sudo lspci -vv
> >>> ... ...
> >>> 01:00.0 Non-Volatile memory controller: Intel Corporation Device f1a6 
> >>> (rev 03) (prog-if 02 [NVM Express])
> >>>   Subsystem: Intel Corporation Device 390b
> >>> ... ...
> >>>   Capabilities: [b0] MSI-X: Enable- Count=22 Masked-
> >>>           Vector table: BAR=0 offset=00002000
> >>>           PBA: BAR=0 offset=00002100
> >>>
> >>>
> >>>
> >>> A patch below could workaround the issue and passthrough nvme 
> >>> successfully.
> >>>
> >>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >>> index 5c7bd96..54fc25e 100644
> >>> --- a/hw/vfio/pci.c
> >>> +++ b/hw/vfio/pci.c
> >>> @@ -1510,6 +1510,11 @@ static void vfio_msix_early_setup(VFIOPCIDevice 
> >>> *vdev, Error **errp)
> >>>      msix->pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
> >>>      msix->entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
> >>>  
> >>> +    if (msix->table_bar == msix->pba_bar &&
> >>> +        msix->table_offset + msix->entries * PCI_MSIX_ENTRY_SIZE > 
> >>> msix->pba_offset) {
> >>> +        msix->entries = (msix->pba_offset - msix->table_offset) / 
> >>> PCI_MSIX_ENTRY_SIZE;
> >>> +    }
> >>> +
> >>>      /*
> >>>       * Test the size of the pba_offset variable and catch if it extends 
> >>> outside
> >>>       * of the specified BAR. If it is the case, we need to apply a 
> >>> hardware
> >>>
> >>>
> >>> Would you please help confirm if this can be regarded as bug in qemu, or 
> >>> issue
> >>> with nvme hardware? Should we fix thin in qemu, or we should never use 
> >>> such buggy
> >>> hardware with vfio?  
> >>
> >> It's a hardware bug, is there perhaps a firmware update for the device
> >> that resolves it?  It's curious that a vector table size of 0x100 gives
> >> us 16 entries and 22 in hex is 0x16 (table size would be reported as
> >> 0x15 for the N-1 algorithm).  I wonder if there's a hex vs decimal
> >> mismatch going on.  We don't really know if the workaround above is
> >> correct, are there really 16 entries or maybe does the PBA actually
> >> start at a different offset?  We wouldn't want to generically assume
> >> one or the other.  I think we need Intel to tell us in which way their
> >> hardware is broken and whether it can or is already fixed in a firmware
> >> update.  Thanks,  
> > 
> > Thank you very much for the confirmation.
> > 
> > Just realized looks this would make trouble to my desktop as well when 17
> > vectors are used.
> > 
> > I will report to intel and confirm how this can happen and if there is any
> > firmware update available for this issue.
> >   
> 
> I found there is similar issue reported to kvm:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=202055
> 
> 
> I confirmed with my env again. By default, the msi-x count is 16.
> 
>       Capabilities: [b0] MSI-X: Enable+ Count=16 Masked-
>               Vector table: BAR=0 offset=00002000
>               PBA: BAR=0 offset=00002100
> 
> 
> The count is still 16 after the device is assigned to vfio (Enable- now):
> 
> # echo 0000:01:00.0 > /sys/bus/pci/devices/0000\:01\:00.0/driver/unbind
> # echo "8086 f1a6" > /sys/bus/pci/drivers/vfio-pci/new_id
> 
> Capabilities: [b0] MSI-X: Enable- Count=16 Masked-
>               Vector table: BAR=0 offset=00002000
>               PBA: BAR=0 offset=00002100
> 
> 
> After I boot qemu with "-device vfio-pci,host=0000:01:00.0", count becomes 22.
> 
> Capabilities: [b0] MSI-X: Enable- Count=22 Masked-
>               Vector table: BAR=0 offset=00002000
>               PBA: BAR=0 offset=00002100
> 
> 
> 
> Another interesting observation is, vfio-based userspace nvme also changes 
> count
> from 16 to 22.
> 
> I reboot host and the count is reset to 16. Then I boot VM with "-drive
> file=nvme://0000:01:00.0/1,if=none,id=nvmedrive0 -device
> virtio-blk,drive=nvmedrive0,id=nvmevirtio0". As userspace nvme uses different
> vfio path, it boots successfully without issue.
> 
> However, the count becomes 22 then:
> 
> Capabilities: [b0] MSI-X: Enable- Count=22 Masked-
>               Vector table: BAR=0 offset=00002000
>               PBA: BAR=0 offset=00002100
> 
> 
> Both vfio and userspace nvme (based on vfio) would change the count from 16 
> to 22.

Yes, we've found in the bz you mention that it's resetting the device
via FLR that causes the device to report a bogus interrupt count.  The
vfio-pci driver will always perform an FLR on the device before
providing it to the user, so whether it's directly assigned with
vfio-pci in QEMU or exposed as an nvme drive via nvme://, it will go
through the same FLR path.  It looks like we need yet another device
specific reset for nvme.  Ideally we could figure out how to recover
the device after an FLR, but potentially we could reset the nvme
controller rather than the PCI interface.  This is becoming a problem
that so many nvme controllers have broken FLRs.  Thanks,

Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]