qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v13 09/13] add check reset mechanism when hotplu


From: Chen Fan
Subject: Re: [Qemu-devel] [PATCH v13 09/13] add check reset mechanism when hotplug vfio device
Date: Mon, 16 Nov 2015 18:18:07 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0

Hi Alex,

  Thanks for your detailed explanation.
during my test, I found that maybe there was another problem in vfio driver,
I use a dual-port NIC which address are: 06:00.0 and 06:00.1 two functions.
then I use aer-inject to inject one error to one function like following:
AER
ID 0000:06:00.0
UNCOR_STATUS DLP
HEADER_LOG 0 1 2 3

here I boot qemu with one enable aer, one disable aer:
./x86_64-softmmu/qemu-system-x86_64 -M q35 -device ioh3420,bus=pcie.0,addr=1c.0,port=1,id=bridge1,chassis=1
 -device vfio-pci,host=06:00.1,bus=bridge1,addr=00.1
-device vfio-pci,host=06:00.0,bus=bridge1,addr=00.0,aer=true,multifunction=on

so we expected that the error only sent to the vfio device with host address is 06:00.0, but I found that all devices (06:00.0 , 06:00.1) receive the signal in qemu, which sent by vfio driver in vfio_pci_aer_err_detected. then qemu stopped by the device with 06:00.1 received the signal.
is that right?

Thanks,
Chen


On 11/14/2015 05:04 AM, Alex Williamson wrote:
On Fri, 2015-11-13 at 11:28 +0800, Cao jin wrote:
On 11/12/2015 07:51 PM, Michael S. Tsirkin wrote:
On Wed, Nov 11, 2015 at 06:34:27PM +0800, Cao jin wrote:
From: Chen Fan <address@hidden>

Since we support multi-function hotplug. the function 0 indicate
the closure of the slot, so we have the chance to do the check.

Signed-off-by: Chen Fan <address@hidden>
---
   hw/pci/pci.c             | 29 +++++++++++++++++++++++++++++
   hw/vfio/pci.c            | 19 +++++++++++++++++++
   hw/vfio/pci.h            |  2 ++
   include/hw/pci/pci_bus.h |  5 +++++
   4 files changed, 55 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 168b9cc..f6ca6ef 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -81,6 +81,7 @@ static void pci_bus_realize(BusState *qbus, Error **errp)
       PCIBus *bus = PCI_BUS(qbus);

       vmstate_register(NULL, -1, &vmstate_pcibus, bus);
+    notifier_with_return_list_init(&bus->hotplug_notifiers);
   }

   static void pci_bus_unrealize(BusState *qbus, Error **errp)
@@ -1835,6 +1836,22 @@ PCIDevice *pci_find_device(PCIBus *bus, int bus_num, 
uint8_t devfn)
       return bus->devices[devfn];
   }

+void pci_bus_add_hotplug_notifier(PCIBus *bus, NotifierWithReturn *notify)
+{
+    notifier_with_return_list_add(&bus->hotplug_notifiers, notify);
+}
+
+void pci_bus_remove_hotplug_notifier(NotifierWithReturn *notifier)
+{
+    notifier_with_return_remove(notifier);
+}
+
+static int pci_bus_hotplug_notifier(PCIBus *bus, void *opaque)
+{
+    return notifier_with_return_list_notify(&bus->hotplug_notifiers,
+                                            opaque);
+}
+
   static void pci_qdev_realize(DeviceState *qdev, Error **errp)
   {
       PCIDevice *pci_dev = (PCIDevice *)qdev;
@@ -1877,6 +1894,18 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
           pci_qdev_unrealize(DEVICE(pci_dev), NULL);
           return;
       }
+
+    /*
+     *  If the function is func 0, indicate the closure of the slot.
+     *  signal the callback.
+     */
+    if (DEVICE(pci_dev)->hotplugged &&
+        pci_get_function_0(pci_dev) == pci_dev &&
+        pci_bus_hotplug_notifier(bus, pci_dev)) {
+        error_setg(errp, "failed to hotplug function 0");
+        pci_qdev_unrealize(DEVICE(pci_dev), NULL);
+        return;
+    }
I don't understand why this is required in pci core.
PCI Device is already constructed anyway.
Just do the checks and call unrealize in vfio.
Because when do multi-function hotplug, the function 0 on the pcie bus
probably is not a vfio device. so we should trigger the check from pci
core.

I also don't see why you are tying this to hotplug.
I would check each function as it's added.
But that's a vfio thing, if both you and Alex think
it's a good idea, fine by me.
The device is  initialized one by one no matter it is cold plugged or
hot plugged, but for the vfio with aer that need to get depended devices
required by bus reset, so need to make sure the reset depended devices
are assigned to qemu, in vfio, there is a machine done callback to check
the bus reset for boot up, so it also should be done in hotplug。

it looks little complicated, Alex, any idea?

So the problem is that to support AER we need to be able to do a bus
reset of the device, both in the virtual and physical spaces.  A
physical bus reset is likely to affect more than a single device since
we're often dealing with multifunction endpoints.  Those functions may
be considered isolated on the host due to ACS, but we cannot reset the
bus without affecting all of the functions.  Therefore, we need to test
whether we have a compatible setup, but it involves more than a single
device.  We cannot test each device as it is initialized because any
time more than one device is affected, and we haven't yet added the
other devices, we'll fail the test.

There are two separate cases where we need to solve this problem,
coldplug and hotplug.  Coldplug can be resolved by using the
machine-init done notifier to verify that our configuration is
compatible.  We have no requirements for the ordering of devices during
cold initialization.  For the hotplug case, we've defined that function
0 closes the slot, which provides an opportunity for us to do the same
verification.  However, function 0 is not necessarily a vfio-pci device.
We can create our own multifunction devices in the VM, where function 0
could be any type of pci device.  Thus vfio-pci cannot notify itself
when a slot is closed and due to the above mentioned problem, we cannot
verify as each device is added.

So, I don't really see a better way to solve the problem than what's
being proposed here.  Thanks,

Alex

.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]