[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v5 2/2] enable multi-function hot-add
From: |
Michael S. Tsirkin |
Subject: |
Re: [Qemu-devel] [PATCH v5 2/2] enable multi-function hot-add |
Date: |
Mon, 26 Oct 2015 14:14:35 +0200 |
On Mon, Oct 26, 2015 at 07:57:16PM +0800, Cao jin wrote:
> Hi,
> warning, there is a long story below O:-)
>
> On 10/26/2015 04:29 PM, Michael S. Tsirkin wrote:
> >On Mon, Oct 26, 2015 at 11:29:18AM +0800, Cao jin wrote:
> >>Enable pcie device multifunction hot, just ensure the function 0
> >>added last, then driver will got the notification to scan all the
> >>function in the slot.
> >>
> >>Signed-off-by: Cao jin <address@hidden>
In my opinion, what's confusing you is that you keep
thinking about ARI.
ARI is just a way where PCI_SLOT(devfn) can be != 0
when you are actually just part of the same device.
But with or without ARI, PCIE devices with upstream ports
can only occupy slot 0.
So let's check for that.
> >>---
> >> hw/pci/pci.c | 31 ++++++++++++++++++++++++++++++-
> >> hw/pci/pci_host.c | 13 +++++++++++--
> >> hw/pci/pcie.c | 18 +++++++++---------
> >> include/hw/pci/pci.h | 1 +
> >> 4 files changed, 51 insertions(+), 12 deletions(-)
> >>
> >>diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >>index b0bf540..6f43b12 100644
> >>--- a/hw/pci/pci.c
> >>+++ b/hw/pci/pci.c
> >>@@ -847,6 +847,9 @@ static PCIDevice *do_pci_register_device(PCIDevice
> >>*pci_dev, PCIBus *bus,
> >> PCIConfigWriteFunc *config_write = pc->config_write;
> >> Error *local_err = NULL;
> >> AddressSpace *dma_as;
> >>+ DeviceState *dev = DEVICE(pci_dev);
> >>+
> >>+ pci_dev->bus = bus;
> >>
> >> if (devfn < 0) {
> >> for(devfn = bus->devfn_min ; devfn < ARRAY_SIZE(bus->devices);
> >>@@ -864,9 +867,17 @@ static PCIDevice *do_pci_register_device(PCIDevice
> >>*pci_dev, PCIBus *bus,
> >> PCI_SLOT(devfn), PCI_FUNC(devfn), name,
> >> bus->devices[devfn]->name);
> >> return NULL;
> >>+ } else if (dev->hotplugged &&
> >>+ pci_get_function_0(pci_dev)) {
> >>+ error_setg(errp, "PCI: slot %d function 0 already ocuppied by %s,"
> >>+ " new func %s cannot be exposed to guest.",
> >>+ PCI_SLOT(devfn),
> >>+ bus->devices[PCI_DEVFN(PCI_SLOT(devfn), 0)]->name,
> >>+ name);
> >>+
> >>+ return NULL;
> >> }
> >>
> >>- pci_dev->bus = bus;
> >> pci_dev->devfn = devfn;
> >> dma_as = pci_device_iommu_address_space(pci_dev);
> >>
> >>@@ -2454,6 +2465,24 @@ void pci_bus_get_w64_range(PCIBus *bus, Range *range)
> >> pci_for_each_device_under_bus(bus, pci_dev_get_w64, range);
> >> }
> >>
> >>+/* ARI device function number range is 0-255, means has only 1 function0;
> >>+ * while non-ARI device has 1 function0 in each slot. non-ARI device could
> >>+ * be PCI or PCIe, and there is up to 32 slots for PCI */
> >>+PCIDevice *pci_get_function_0(PCIDevice *pci_dev)
> >>+{
> >>+ PCIDevice *parent_dev;
> >>+
> >>+ parent_dev = pci_bridge_get_device(pci_dev->bus);
> >>+ if (pcie_cap_is_arifwd_enabled(parent_dev) &&
> >>+ pci_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI)) {
> >>+ /* ARI enabled */
> >>+ return pci_dev->bus->devices[0];
> >
> >That's wrong I think since software might enable ARI after hotplug.
>
> According to the spec, ARI feature is enabled only based on the following 2
> conditions:
> 1. For an ARI Downstream Port, the capability is communicated through the
> Device Capabilities 2 register.
> 2. For an ARI Device, the capability is communicated through the ARI
> Capability structure.
>
> also according to the driver code pci_configure_ari(), I think my
> implementation does follows spec?
>
> And as you know, only ARI feature is enabled, we return
> pci_dev->bus->devices[0]
Yes but that's not the point.
The point is whether a device can occupy slot != 0.
> >And I'm not sure all functions must have the ARI capability.
> >
>
> do you means there maybe the following condition: ARI forwarding bit is
> enabled in downstream port, but the functions below, some have ARI
> Capability structure while the others don`t. Shouldn`t we forbid the
> condition? Because If this condition happens, I am not sure whether the
> device could work normally.
>
> In the IMPLEMENTATION NOTE: ARI Forwarding Enable Being Set Inappropriately,
> It seems the spec don`t want that complicated condition? While actually,
> qemu calls pcie_cap_arifwd_init() in root port/downstream port
> initialization first.
>
> >I don't see why don't you just go by spec:
> >
>
> I do read the spec, and also referred the pcie driver code(see
> pci_configure_ari()). I think it is my inaccurate understanding about
> "upstream port" results in my implementation(I also consult PCISIG support
> team for this question, see attachment). The concept of "upstream port" in
> ARI device definition confuse me a lot.
>
> Talking about the definition of ARI device, I always thought the "upstream
> port" should on the ARI device itself(like the figure I drew before), or
> else why the definition add the words "with an upstream port"? It seems to
> me that the emphasis is on "with an upstream port", and implies to me that
> the non-ARI device doesn`t have an upstream port. Seeing their replies in
> attachment, says that I am wrong at this point, both of them have an
> upstream port.(their saying actually make me more confused at that time, but
> now I think I am clear about this concept after reading your implementation
> below)
>
> After reading your implementation, and the PCISIG support team replies
> again, finally I figure out that, "upstream port" in ARI device definition
> is just a port whose position is closer to root complex, and the point is,
> the "upstream port" doesn`t need to exist on the ARI device itself, which
> also means, take Figure 6-13 in PCIe spec 3.1, the Root Port A is the
> "upstream port" for ARI Device X, and the Downstream Port D in Switch is the
> "upstream port" for ARI Device Y. Now, do I understand it right?
>
> Hope I can get you understood from the long description above. If I still
> don`t understand it right, please point it out, after all, it confused me
> for a long time.
>
> >static
> >bool pcie_has_upstream_port(PCIDevice *dev)
> >{
> > PCIDevice *parent_dev = pci_bridge_get_device(pci_dev->bus);
> >
> > /*
> > * Device associated with an upstream port.
> > * As there are several types of these, it's easier to check the
> > * parent device: upstream ports are always connected to
> > * root or downstream ports.
> > */
> > return parent_dev &&
> > pci_is_express(parent_dev) &&
> > parent_dev->exp.exp_cap &&
> > (pcie_cap_get_type(parent_dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > pcie_cap_get_type(parent_dev) == PCI_EXP_TYPE_DOWNSTREAM);
> >}
> >
>
> Assume my understanding is right, which means both ARI and non-ARI device
> have the upstream port(root port or downstream port), could the existence of
> upstream port be the judgment condition?
This tells us whether we are behind a port that
can address devices in slot != 0.
> >
> >PCIDevice *pci_get_function_0(PCIDevice *pci_dev)
> >{
> > if (pcie_has_upstream_port(dev)) {
> > /* With an upstream PCIE port, we only support 1 device at slot 0 */
> > return pci_dev->bus->devices[0];
> > } else {
> > /* Other bus types might support multiple devices at slots 0-31 */
> > return
> > pci_dev->bus->devices[PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 0)];
> > }
> >}
> >
>
> >>+ } else {
> >>+ /* no ARI */
> >>+ return pci_dev->bus->devices[PCI_DEVFN(PCI_SLOT(pci_dev->devfn),
> >>0)];
> >>+ }
> >>+}
> >>+
> >> static const TypeInfo pci_device_type_info = {
> >> .name = TYPE_PCI_DEVICE,
> >> .parent = TYPE_DEVICE,
> >>diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c
> >>index 3e26f92..63d7d2f 100644
> >>--- a/hw/pci/pci_host.c
> >>+++ b/hw/pci/pci_host.c
> >>@@ -20,6 +20,7 @@
> >>
> >> #include "hw/pci/pci.h"
> >> #include "hw/pci/pci_host.h"
> >>+#include "hw/pci/pci_bus.h"
> >> #include "trace.h"
> >>
> >> /* debug PCI */
> >>@@ -75,7 +76,11 @@ void pci_data_write(PCIBus *s, uint32_t addr, uint32_t
> >>val, int len)
> >> PCIDevice *pci_dev = pci_dev_find_by_addr(s, addr);
> >> uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
> >>
> >>- if (!pci_dev) {
> >>+ /* non-zero functions are only exposed when function 0 is present,
> >>+ * allowing direct removal of unexposed functions.
> >>+ */
> >>+ if (!pci_dev ||
> >>+ (pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev))) {
> >> return;
> >> }
> >>
> >>@@ -91,7 +96,11 @@ uint32_t pci_data_read(PCIBus *s, uint32_t addr, int len)
> >> uint32_t config_addr = addr & (PCI_CONFIG_SPACE_SIZE - 1);
> >> uint32_t val;
> >>
> >>- if (!pci_dev) {
> >>+ /* non-zero functions are only exposed when function 0 is present,
> >>+ * allowing direct removal of unexposed functions.
> >>+ */
> >>+ if (!pci_dev ||
> >>+ (pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev))) {
> >> return ~0x0;
> >> }
> >>
> >>diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> >>index b1adeaf..4ba9501 100644
> >>--- a/hw/pci/pcie.c
> >>+++ b/hw/pci/pcie.c
> >>@@ -249,16 +249,16 @@ void pcie_cap_slot_hotplug_cb(HotplugHandler
> >>*hotplug_dev, DeviceState *dev,
> >> return;
> >> }
> >>
> >>- /* TODO: multifunction hot-plug.
> >>- * Right now, only a device of function = 0 is allowed to be
> >>- * hot plugged/unplugged.
> >>+ /* To enable multifunction hot-plug, we just ensure the function
> >>+ * 0 added last. When function 0 is added, we set the sltsta and
> >>+ * inform OS via event notification.
> >> */
> >>- assert(PCI_FUNC(pci_dev->devfn) == 0);
> >>-
> >>- pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> >>- PCI_EXP_SLTSTA_PDS);
> >>- pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
> >>- PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
> >>+ if (pci_get_function_0(pci_dev)) {
> >>+ pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTSTA,
> >>+ PCI_EXP_SLTSTA_PDS);
> >>+ pcie_cap_slot_event(PCI_DEVICE(hotplug_dev),
> >>+ PCI_EXP_HP_EV_PDC | PCI_EXP_HP_EV_ABP);
> >>+ }
> >> }
> >>
> >> static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque)
> >>diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> >>index f5e7fd8..379b6e1 100644
> >>--- a/include/hw/pci/pci.h
> >>+++ b/include/hw/pci/pci.h
> >>@@ -397,6 +397,7 @@ void pci_for_each_bus_depth_first(PCIBus *bus,
> >> void *(*begin)(PCIBus *bus, void
> >> *parent_state),
> >> void (*end)(PCIBus *bus, void *state),
> >> void *parent_state);
> >>+PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
> >>
> >> /* Use this wrapper when specific scan order is not required. */
> >> static inline
> >>--
> >>2.1.0
> >.
> >
>
> --
> Yours Sincerely,
>
> Cao Jin