qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 2/2] enable multi-function hot-add


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [PATCH v5 2/2] enable multi-function hot-add
Date: Mon, 26 Oct 2015 14:14:35 +0200

On Mon, Oct 26, 2015 at 07:57:16PM +0800, Cao jin wrote:
> Hi,
>     warning, there is a long story below O:-)
> 
> On 10/26/2015 04:29 PM, Michael S. Tsirkin wrote:
> >On Mon, Oct 26, 2015 at 11:29:18AM +0800, Cao jin wrote:
> >>Enable pcie device multifunction hot, just ensure the function 0
> >>added last, then driver will got the notification to scan all the
> >>function in the slot.
> >>
> >>Signed-off-by: Cao jin <address@hidden>


In my opinion, what's confusing you is that you keep
thinking about ARI.
ARI is just a way where PCI_SLOT(devfn) can be != 0
when you are actually just part of the same device.

But with or without ARI, PCIE devices with upstream ports
can only occupy slot 0.

So let's check for that.


> >>---
> >>  hw/pci/pci.c         | 31 ++++++++++++++++++++++++++++++-
> >>  hw/pci/pci_host.c    | 13 +++++++++++--
> >>  hw/pci/pcie.c        | 18 +++++++++---------
> >>  include/hw/pci/pci.h |  1 +
> >>  4 files changed, 51 insertions(+), 12 deletions(-)
> >>
> >>diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >>index b0bf540..6f43b12 100644
> >>--- a/hw/pci/pci.c
> >>+++ b/hw/pci/pci.c
> >>@@ -847,6 +847,9 @@ static PCIDevice *do_pci_register_device(PCIDevice 
> >>*pci_dev, PCIBus *bus,
> >>      PCIConfigWriteFunc *config_write = pc->config_write;
> >>      Error *local_err = NULL;
> >>      AddressSpace *dma_as;
> >>+    DeviceState *dev = DEVICE(pci_dev);
> >>+
> >>+    pci_dev->bus = bus;
> >>
> >>      if (devfn < 0) {
> >>          for(devfn = bus->devfn_min ; devfn < ARRAY_SIZE(bus->devices);
> >>@@ -864,9 +867,17 @@ static PCIDevice *do_pci_register_device(PCIDevice 
> >>*pci_dev, PCIBus *bus,
> >>                     PCI_SLOT(devfn), PCI_FUNC(devfn), name,
> >>                     bus->devices[devfn]->name);
> >>          return NULL;
> >>+    } else if (dev->hotplugged &&
> >>+               pci_get_function_0(pci_dev)) {
> >>+        error_setg(errp, "PCI: slot %d function 0 already ocuppied by %s,"
> >>+                   " new func %s cannot be exposed to guest.",
> >>+                   PCI_SLOT(devfn),
> >>+                   bus->devices[PCI_DEVFN(PCI_SLOT(devfn), 0)]->name,
> >>+                   name);
> >>+
> >>+       return NULL;
> >>      }
> >>
> >>-    pci_dev->bus = bus;
> >>      pci_dev->devfn = devfn;
> >>      dma_as = pci_device_iommu_address_space(pci_dev);
> >>
> >>@@ -2454,6 +2465,24 @@ void pci_bus_get_w64_range(PCIBus *bus, Range *range)
> >>      pci_for_each_device_under_bus(bus, pci_dev_get_w64, range);
> >>  }
> >>
> >>+/* ARI device function number range is 0-255, means has only 1 function0;
> >>+ * while non-ARI device has 1 function0 in each slot. non-ARI device could
> >>+ * be PCI or PCIe, and there is up to 32 slots for PCI */
> >>+PCIDevice *pci_get_function_0(PCIDevice *pci_dev)
> >>+{
> >>+    PCIDevice *parent_dev;
> >>+
> >>+    parent_dev = pci_bridge_get_device(pci_dev->bus);
> >>+    if (pcie_cap_is_arifwd_enabled(parent_dev) &&
> >>+        pci_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI)) {
> >>+        /* ARI enabled */
> >>+        return pci_dev->bus->devices[0];
> >
> >That's wrong I think since software might enable ARI after hotplug.
> 
> According to the spec, ARI feature is enabled only based on the following 2
> conditions:
> 1. For an ARI Downstream Port, the capability is communicated through the
> Device Capabilities 2 register.
> 2. For an ARI Device, the capability is communicated through the ARI
> Capability structure.
> 
> also according to the driver code pci_configure_ari(), I think my
> implementation does follows spec?
> 
> And as you know, only ARI feature is enabled, we return
> pci_dev->bus->devices[0]


Yes but that's not the point.
The point is whether a device can occupy slot != 0.

> >And I'm not sure all functions must have the ARI capability.
> >
> 
> do you means there maybe the following condition: ARI forwarding bit is
> enabled in downstream port, but the functions below, some have ARI
> Capability structure while the others don`t. Shouldn`t we forbid the
> condition? Because If this condition happens, I am not sure whether the
> device could work normally.
> 
> In the IMPLEMENTATION NOTE: ARI Forwarding Enable Being Set Inappropriately,
> It seems the spec don`t want that complicated condition? While actually,
> qemu calls pcie_cap_arifwd_init() in root port/downstream port
> initialization first.
> 
> >I don't see why don't you just go by spec:
> >
> 
> I do read the spec, and also referred the pcie driver code(see
> pci_configure_ari()). I think it is my inaccurate understanding about
> "upstream port" results in my implementation(I also consult PCISIG support
> team for this question, see attachment). The concept of "upstream port" in
> ARI device definition confuse me a lot.
> 
> Talking about the definition of ARI device, I always thought the "upstream
> port" should on the ARI device itself(like the figure I drew before), or
> else why the definition add the words "with an upstream port"? It seems to
> me that the emphasis is on "with an upstream port", and implies to me that
> the non-ARI device doesn`t have an upstream port. Seeing their replies in
> attachment, says that I am wrong at this point, both of them have an
> upstream port.(their saying actually make me more confused at that time, but
> now I think I am clear about this concept after reading your implementation
> below)
> 
> After reading your implementation, and the PCISIG support team replies
> again, finally I figure out that, "upstream port" in ARI device definition
> is just a port whose position is closer to root complex, and the point is,
> the "upstream port" doesn`t need to exist on the ARI device itself, which
> also means, take Figure 6-13 in PCIe spec 3.1, the Root Port A is the
> "upstream port" for ARI Device X, and the Downstream Port D in Switch is the
> "upstream port" for ARI Device Y. Now, do I understand it right?
> 
> Hope I can get you understood from the long description above. If I still
> don`t understand it right, please point it out, after all, it confused me
> for a long time.
> 
> >static
> >bool pcie_has_upstream_port(PCIDevice *dev)
> >{
> >     PCIDevice *parent_dev = pci_bridge_get_device(pci_dev->bus);
> >
> >     /*
> >      * Device associated with an upstream port.
> >      * As there are several types of these, it's easier to check the
> >      * parent device: upstream ports are always connected to
> >      * root or downstream ports.
> >      */
> >     return parent_dev &&
> >         pci_is_express(parent_dev) &&
> >             parent_dev->exp.exp_cap &&
> >             (pcie_cap_get_type(parent_dev) == PCI_EXP_TYPE_ROOT_PORT ||
> >             pcie_cap_get_type(parent_dev) == PCI_EXP_TYPE_DOWNSTREAM);
> >}
> >
> 
> Assume my understanding is right, which means both ARI and non-ARI device
> have the upstream port(root port or downstream port), could the existence of
> upstream port be the judgment condition?

This tells us whether we are behind a port that
can address devices in slot != 0.


> >
> >PCIDevice *pci_get_function_0(PCIDevice *pci_dev)
> >{
> >     if (pcie_has_upstream_port(dev)) {
> >         /* With an upstream PCIE port, we only support 1 device at slot 0 */
> >             return pci_dev->bus->devices[0];
> >     } else {
> >         /* Other bus types might support multiple devices at slots 0-31 */
> >             return 
> > pci_dev->bus->devices[PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 0)];
> >     }
> >}
> >
> 
> >>+    } else {
> >>+        /* no ARI */
> >>+        return pci_dev->bus->devices[PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 
> >>0)];
> >>+    }
> >>+}
> >>+
> >>  static const TypeInfo pci_device_type_info = {
> >>      .name = TYPE_PCI_DEVICE,
> >>      .parent = TYPE_DEVICE,
> >>diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c
> >>index 3e26f92..63d7d2f 100644
> >>--- a/hw/pci/pci_host.c
> >>+++ b/hw/pci/pci_host.c
> >>@@ -20,6 +20,7 @@
> >>
> >>  #include "hw/pci/pci.h"
> >>  #include "hw/pci/pci_host.h"
> >>+#include "hw/pci/pci_bus.h"
> >>  #include "trace.h"
> >>
> >>  /* debug PCI */
> >>@@ -75,7 +76,11 @@ void pci_data_write(PCIBus *s, uint32_t addr, uint32_t 
> >>val, int len)
> >>      PCIDevice *pci_dev = pci_dev_find_by_addr(s, addr);
> >>      uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
> >>
> >>-    if (!pci_dev) {
> >>+    /* non-zero functions are only exposed when function 0 is present,
> >>+     * allowing direct removal of unexposed functions.
> >>+     */
> >>+    if (!pci_dev ||
> >>+        (pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev))) {
> >>          return;
> >>      }
> >>
> >>@@ -91,7 +96,11 @@ uint32_t pci_data_read(PCIBus *s, uint32_t addr, int len)
> >>      uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
> >>      uint32_t val;
> >>
> >>-    if (!pci_dev) {
> >>+    /* non-zero functions are only exposed when function 0 is present,
> >>+     * allowing direct removal of unexposed functions.
> >>+     */
> >>+    if (!pci_dev ||
> >>+        (pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev))) {
> >>          return ~0x0;
> >>      }
> >>
> >>diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> >>index b1adeaf..4ba9501 100644
> >>--- a/hw/pci/pcie.c
> >>+++ b/hw/pci/pcie.c
> >>@@ -249,16 +249,16 @@ void pcie_cap_slot_hotplug_cb(HotplugHandler 
> >>*hotplug_dev, DeviceState *dev,
> >>          return;
> >>      }
> >>
> >>-    /* TODO: multifunction hot-plug.
> >>-     * Right now, only a device of function = 0 is allowed to be
> >>-     * hot plugged/unplugged.
> >>+    /* To enable multifunction hot-plug, we just ensure the function
> >>+     * 0 added last. When function 0 is added, we set the sltsta and
> >>+     * inform OS via event notification.
> >>       */
> >>-    assert(PCI_FUNC(pci_dev->devfn) == 0);
> >>-
> >>-    pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> >>-                               PCI_EXP_SLTSTA_PDS);
> >>-    pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
> >>-                        PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
> >>+    if (pci_get_function_0(pci_dev)) {
> >>+        pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> >>+                                   PCI_EXP_SLTSTA_PDS);
> >>+        pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
> >>+                            PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
> >>+    }
> >>  }
> >>
> >>  static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque)
> >>diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> >>index f5e7fd8..379b6e1 100644
> >>--- a/include/hw/pci/pci.h
> >>+++ b/include/hw/pci/pci.h
> >>@@ -397,6 +397,7 @@ void pci_for_each_bus_depth_first(PCIBus *bus,
> >>                                    void *(*begin)(PCIBus *bus, void 
> >> *parent_state),
> >>                                    void (*end)(PCIBus *bus, void *state),
> >>                                    void *parent_state);
> >>+PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
> >>
> >>  /* Use this wrapper when specific scan order is not required. */
> >>  static inline
> >>--
> >>2.1.0
> >.
> >
> 
> -- 
> Yours Sincerely,
> 
> Cao Jin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]