qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] iommu emulation


From: Alex Williamson
Subject: Re: [Qemu-devel] iommu emulation
Date: Wed, 15 Feb 2017 19:47:58 -0700

On Thu, 16 Feb 2017 10:28:39 +0800
Peter Xu <address@hidden> wrote:

> On Wed, Feb 15, 2017 at 11:15:52AM -0700, Alex Williamson wrote:
> 
> [...]
> 
> > > Alex, do you like something like below to fix above issue that Jintack
> > > has encountered?
> > > 
> > > (note: this code is not for compile, only trying show what I mean...)
> > > 
> > > ------8<-------
> > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > index 332f41d..4dca631 100644
> > > --- a/hw/vfio/pci.c
> > > +++ b/hw/vfio/pci.c
> > > @@ -1877,25 +1877,6 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
> > >       */
> > >      config = g_memdup(pdev->config, vdev->config_size);
> > > 
> > > -    /*
> > > -     * Extended capabilities are chained with each pointing to the next, 
> > > so we
> > > -     * can drop anything other than the head of the chain simply by 
> > > modifying
> > > -     * the previous next pointer.  For the head of the chain, we can 
> > > modify the
> > > -     * capability ID to something that cannot match a valid capability.  
> > > ID
> > > -     * 0 is reserved for this since absence of capabilities is indicated 
> > > by
> > > -     * 0 for the ID, version, AND next pointer.  However, 
> > > pcie_add_capability()
> > > -     * uses ID 0 as reserved for list management and will incorrectly 
> > > match and
> > > -     * assert if we attempt to pre-load the head of the chain with this 
> > > ID.
> > > -     * Use ID 0xFFFF temporarily since it is also seems to be reserved in
> > > -     * part for identifying absence of capabilities in a root complex 
> > > register
> > > -     * block.  If the ID still exists after adding capabilities, switch 
> > > back to
> > > -     * zero.  We'll mark this entire first dword as emulated for this 
> > > purpose.
> > > -     */
> > > -    pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE,
> > > -                 PCI_EXT_CAP(0xFFFF, 0, 0));
> > > -    pci_set_long(pdev->wmask + PCI_CONFIG_SPACE_SIZE, 0);
> > > -    pci_set_long(vdev->emulated_config_bits + PCI_CONFIG_SPACE_SIZE, ~0);
> > > -
> > >      for (next = PCI_CONFIG_SPACE_SIZE; next;
> > >           next = PCI_EXT_CAP_NEXT(pci_get_long(config + next))) {
> > >          header = pci_get_long(config + next);
> > > @@ -1917,6 +1898,8 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
> > >          switch (cap_id) {
> > >          case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
> > >          case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function 
> > > virtualization */
> > > +            /* keep this ecap header (4 bytes), but mask cap_id to 
> > > 0xffff */
> > > +            ...
> > >              trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, 
> > > next);
> > >              break;
> > >          default:
> > > @@ -1925,11 +1908,6 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
> > > 
> > >      }
> > > 
> > > -    /* Cleanup chain head ID if necessary */
> > > -    if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0xFFFF) {
> > > -        pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
> > > -    }
> > > -
> > >      g_free(config);
> > >      return;
> > >  }  
> > > ----->8-----    
> > > 
> > > Since after all we need the assumption that 0xffff is reserved for
> > > cap_id. Then, we can just remove the "first 0xffff then 0x0" hack,
> > > which is imho error-prone and hacky.  
> > 
> > This doesn't fix the bug, which is that pcie_add_capability() uses a
> > valid capability ID for it's own internal tracking.  It's only doing
> > this to find the end of the capability chain, which we could do in a
> > spec complaint way by looking for a zero next pointer.  Fix that and
> > then vfio doesn't need to do this set to 0xffff then back to zero
> > nonsense at all.  Capability ID zero is valid.  Thanks,  
> 
> Yeah I see Michael's fix on the capability list stuff. However, imho
> these are two different issues? Or say, even if with that patch, we
> should still need this hack (first 0x0, then 0xffff) right? Since
> looks like that patch didn't solve the problem if the first pcie ecap
> is masked at 0x100.

I thought the problem was that QEMU in the host exposes a device with a
capability ID of 0 to the L1 guest.  QEMU in the L1 guest balks at a
capability ID of 0 because that's how it finds the end of the chain.
Therefore if we make QEMU not use capability ID 0 for internal
purposes, things work.  vfio using 0xffff and swapping back to 0x0
becomes unnecessary, but doesn't hurt anything.  Thanks,

Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]