qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 5/7] vfio-pci: pass the aer error to guest


From: Alex Williamson
Subject: Re: [Qemu-devel] [PATCH v5 5/7] vfio-pci: pass the aer error to guest
Date: Sun, 15 Mar 2015 21:52:07 -0600

On Mon, 2015-03-16 at 11:05 +0800, Chen Fan wrote:
> On 03/14/2015 06:34 AM, Alex Williamson wrote:
> > On Thu, 2015-03-12 at 18:23 +0800, Chen Fan wrote:
> >> when the vfio device encounters an uncorrectable error in host,
> >> the vfio_pci driver will signal the eventfd registered by this
> >> vfio device, the results in the qemu eventfd handler getting
> >> invoked.
> >>
> >> this patch is to pass the error to guest and have the guest driver
> >> recover from the error.
> > What is going to be the typical recovery mechanism for the guest?  I'm
> > concerned that the topology of the device in the guest doesn't
> > necessarily match the topology of the device in the host, so if the
> > guest were to attempt a bus reset to recover a device, for instance,
> > what happens?
> the recovery mechanism is that when guest got an aer error from a device,
> guest will clean the corresponding status bit in device register. and for
> need reset device, the guest aer driver would reset all devices under bus.

Sorry, I'm still confused, how does the guest aer driver reset all
devices under a bus?  Are we talking about function-level, device
specific reset mechanisms or secondary bus resets?  If the guest is
performing secondary bus resets, what guarantee do they have that it
will translate to a physical secondary bus reset?  vfio may only do an
FLR when the bus is reset or it may not be able to do anything depending
on the available function-level resets and physical and virtual topology
of the device.  Thanks,

Alex

> >> Signed-off-by: Chen Fan <address@hidden>
> >> ---
> >>   hw/vfio/pci.c | 34 ++++++++++++++++++++++++++++------
> >>   1 file changed, 28 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >> index 0a515b6..8966c49 100644
> >> --- a/hw/vfio/pci.c
> >> +++ b/hw/vfio/pci.c
> >> @@ -3240,18 +3240,40 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
> >>   static void vfio_err_notifier_handler(void *opaque)
> >>   {
> >>       VFIOPCIDevice *vdev = opaque;
> >> +    PCIDevice *dev = &vdev->pdev;
> >> +    PCIEAERMsg msg = {
> >> +        .severity = 0,
> >> +        .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn,
> >> +    };
> >>   
> >>       if (!event_notifier_test_and_clear(&vdev->err_notifier)) {
> >>           return;
> >>       }
> >>   
> >> +    /* we should read the error details from the real hardware
> >> +     * configuration spaces, here we only need to do is signaling
> >> +     * to guest an uncorrectable error has occurred.
> >> +     */
> > Inconsistent comment style
> >
> >> +     if(dev->exp.aer_cap) {
> >           ^ space
> >
> >> +        uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
> >> +        uint32_t uncor_status;
> >> +        bool isfatal;
> >> +
> >> +        uncor_status = vfio_pci_read_config(dev,
> >> +                           dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4);
> >> +
> >> +        isfatal = uncor_status & pci_get_long(aer_cap + 
> >> PCI_ERR_UNCOR_SEVER);
> >> +
> >> +        msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
> >> +                                 PCI_ERR_ROOT_CMD_NONFATAL_EN;
> >> +
> >> +        pcie_aer_msg(dev, &msg);
> >> +        return;
> >> +    }
> >> +
> >>       /*
> >> -     * TBD. Retrieve the error details and decide what action
> >> -     * needs to be taken. One of the actions could be to pass
> >> -     * the error to the guest and have the guest driver recover
> >> -     * from the error. This requires that PCIe capabilities be
> >> -     * exposed to the guest. For now, we just terminate the
> >> -     * guest to contain the error.
> >> +     * If the aer capability is not exposed to the guest. we just
> >> +     * terminate the guest to contain the error.
> >>        */
> >>   
> >>       error_report("%s(%04x:%02x:%02x.%x) Unrecoverable error detected.  "
> >
> >
> > .
> >
> 






reply via email to

[Prev in Thread] Current Thread [Next in Thread]