qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v12 00/15] vfio-pci: pass the aer error to guest


From: Alex Williamson
Subject: Re: [Qemu-devel] [PATCH v12 00/15] vfio-pci: pass the aer error to guest
Date: Tue, 28 Jul 2015 09:35:48 -0600

On Tue, 2015-07-28 at 07:48 +0000, Chen, Hanxiao wrote:
> Hi, Alex
> 
> > -----Original Message-----
> > From: Alex Williamson [mailto:address@hidden
> > To: Chen, Hanxiao
> > Cc: address@hidden; Chen, Fan
> > Subject: Re: [Qemu-devel] [PATCH v12 00/15] vfio-pci: pass the aer error to 
> > guest
> > 
> > On Thu, 2015-07-16 at 12:00 +0800, Chen Hanxiao wrote:
> > > From: Chen Fan <address@hidden>
> > >
> > > For now, when qemu receives an error from host aer report
> > > by vfio pci passthough devices, qemu just terminate the guest.
> > > Usually user want to know what error occurred
> > > rather than stop the guest.
> > >
> > > This patches add aer capability support for vfio device,
> > > then pass the error to guest, and let guest driver to recover
> > > from the error.
> > > Turning on SERR# for error forwording in bridge control register
> > > patch in seabios has been merged as commit 32ec3ee.
> > >
> > > notes:
> > >   this series patches enable aer support single/multi-function,
> > >   for multi-function, require all the function of the slot assigned to
> > >   VM and on the same slot.
> > >
> > > Chen Fan (15):
> > >   vfio: extract vfio_get_hot_reset_info as a single function
> > >   vfio: squeeze out vfio_pci_do_hot_reset for support bus reset
> > >   pcie: modify the capability size assert
> > >   vfio: make the 4 bytes aligned for capability size
> > >   vfio: add pcie extanded capability support
> > >   aer: impove pcie_aer_init to support vfio device
> > >   vfio: add aer support for vfio device
> > >   vfio: add check host bus reset is support or not
> > >   pci: add bus reset_notifiers callbacks for host bus reset
> > >   vfio: add sec_bus_reset notifier to notify physical bus reset is
> > >     needed
> > >   vfio: modify vfio_pci_hot_reset to support bus reset
> > >   vfio: do hot bus reset when do virtual secondary bus reset
> > >   pcie_aer: expose pcie_aer_msg() interface
> > >   vfio-pci: pass the aer error to guest
> > >   vfio: add 'aer' property to expose aercap
> > >
> > >  hw/pci-bridge/ioh3420.c            |   2 +-
> > >  hw/pci-bridge/xio3130_downstream.c |   2 +-
> > >  hw/pci-bridge/xio3130_upstream.c   |   2 +-
> > >  hw/pci/pci.c                       |  16 +
> > >  hw/pci/pci_bridge.c                |   6 +
> > >  hw/pci/pcie.c                      |   2 +-
> > >  hw/pci/pcie_aer.c                  |   6 +-
> > >  hw/vfio/pci.c                      | 636 
> > > +++++++++++++++++++++++++++++++++----
> > >  include/hw/pci/pci.h               |   4 +
> > >  include/hw/pci/pci_bus.h           |   2 +
> > >  include/hw/pci/pcie_aer.h          |   3 +-
> > >  11 files changed, 609 insertions(+), 72 deletions(-)
> > 
> > 
> > This seems to be pretty much the same as v11 where I commented that I
> > didn't think it was acceptable to have a feature dependent on having all
> > the functions assigned without supporting hot-add of multi-function
> > devices.  Can you summarize what's changed here and whether that comment
> > was addressed.  It would be a courtesy to reviewers to provide at least
> > a summary changelog with each new version.  Thanks,
> > 
> 
> We could hot-unplug all passthrough devices by device_del,
> But currently Qemu could not hot-add multi-function pci device.
> 
> See TODO in pcie_cap_slot_hotplug_cb:
> /* TODO: multifunction hot-plug.
>  * Right now, only a device of function = 0 is allowed to be
>  * hot plugged/unplugged.
>  */
> assert(PCI_FUNC(pci_dev->devfn) == 0);
> 
> So we had to limit this as a workaround.
> 
> Why can't  we add functions one by one to the same slot by device_add?
> 
> If we could add functions one by one,
> then we can enable aer for the devices once all dependence functions
> were added by setting aer as a dynamic property(such as using 
> object_property_add, qom-set)

Adding functions individually is not supported by PCI, the specification
indicates that function 0 must always be present and if function 0
reports the multi-function bit set, system software is to scan function
1 through 7.  Therefore any hotplug notification to the slot should do
nothing unless a function zero is present and real hardware doesn't
typically have the ability to spawn new functions, so I wouldn't
necessarily expect additional notifications to a slot to cause system
software to re-scan for new functions.

Also, if we add functions one by one, how do we know we'll ever see the
complete set necessary to support the user requested AER feature?  We
cannot predict that the user will add the needed functions in the
future.

Paolo suggested a possible solution to this in a separate internal
thread, that perhaps we could do multi-function hot-add so long as the
zero function is added last, effectively signaling the closure of the
slot.  Modifications to the PCI-core would be needed such that non-zero
functions could be hot added and config space hidden until the
successful instantiation of the zero function (to prevent unsolicited
discovery and use of the functions prior to exposure).  Perhaps a slot
closure callback would give us the same opportunity we have with
machine_init_done to give a final check of the slot and abort if our aer
requirements are not met.

Multi-function hot-add is not currently supported because nothing
requires it.  If aer is to depend on multi-function, then it's going to
need to bring multi-function hot-add up to a supportable level in order
to be accepted.  Thanks,

Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]