|
From: | Chen Fan |
Subject: | Re: [Qemu-devel] [PATCH v5 5/7] vfio-pci: pass the aer error to guest |
Date: | Mon, 16 Mar 2015 15:35:13 +0800 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 |
On 03/16/2015 11:52 AM, Alex Williamson wrote:
On Mon, 2015-03-16 at 11:05 +0800, Chen Fan wrote:On 03/14/2015 06:34 AM, Alex Williamson wrote:On Thu, 2015-03-12 at 18:23 +0800, Chen Fan wrote:when the vfio device encounters an uncorrectable error in host, the vfio_pci driver will signal the eventfd registered by this vfio device, the results in the qemu eventfd handler getting invoked. this patch is to pass the error to guest and have the guest driver recover from the error.What is going to be the typical recovery mechanism for the guest? I'm concerned that the topology of the device in the guest doesn't necessarily match the topology of the device in the host, so if the guest were to attempt a bus reset to recover a device, for instance, what happens?the recovery mechanism is that when guest got an aer error from a device, guest will clean the corresponding status bit in device register. and for need reset device, the guest aer driver would reset all devices under bus.Sorry, I'm still confused, how does the guest aer driver reset all devices under a bus? Are we talking about function-level, device specific reset mechanisms or secondary bus resets? If the guest is performing secondary bus resets, what guarantee do they have that it will translate to a physical secondary bus reset? vfio may only do an FLR when the bus is reset or it may not be able to do anything depending on the available function-level resets and physical and virtual topology of the device. Thanks,
in general, functions depends on the corresponding device driver behaviors to do the recovery. e.g: implemented the error_detect, slot_reset callbacks. and for link reset, it usually do secondary bus reset. and do we must require to the physical secondary bus reset for vfio device as bus reset? Thanks, Chen
AlexSigned-off-by: Chen Fan <address@hidden> --- hw/vfio/pci.c | 34 ++++++++++++++++++++++++++++------ 1 file changed, 28 insertions(+), 6 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 0a515b6..8966c49 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -3240,18 +3240,40 @@ static void vfio_put_device(VFIOPCIDevice *vdev) static void vfio_err_notifier_handler(void *opaque) { VFIOPCIDevice *vdev = opaque; + PCIDevice *dev = &vdev->pdev; + PCIEAERMsg msg = { + .severity = 0, + .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn, + };if (!event_notifier_test_and_clear(&vdev->err_notifier)) {return; }+ /* we should read the error details from the real hardware+ * configuration spaces, here we only need to do is signaling + * to guest an uncorrectable error has occurred. + */Inconsistent comment style+ if(dev->exp.aer_cap) {^ space+ uint8_t *aer_cap = dev->config + dev->exp.aer_cap; + uint32_t uncor_status; + bool isfatal; + + uncor_status = vfio_pci_read_config(dev, + dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 4); + + isfatal = uncor_status & pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER); + + msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN : + PCI_ERR_ROOT_CMD_NONFATAL_EN; + + pcie_aer_msg(dev, &msg); + return; + } + /* - * TBD. Retrieve the error details and decide what action - * needs to be taken. One of the actions could be to pass - * the error to the guest and have the guest driver recover - * from the error. This requires that PCIe capabilities be - * exposed to the guest. For now, we just terminate the - * guest to contain the error. + * If the aer capability is not exposed to the guest. we just + * terminate the guest to contain the error. */error_report("%s(%04x:%02x:%02x.%x) Unrecoverable error detected. "..
[Prev in Thread] | Current Thread | [Next in Thread] |