[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v4 2/3] pcie: add check for ari capability of pc
From: |
Knut Omang |
Subject: |
Re: [Qemu-devel] [PATCH v4 2/3] pcie: add check for ari capability of pcie devices |
Date: |
Fri, 03 Oct 2014 16:30:49 +0200 |
On Fri, 2014-10-03 at 13:22 +0200, Knut Omang wrote:
> On Wed, 2014-10-01 at 17:08 +0300, Marcel Apfelbaum wrote:
> > On Wed, 2014-10-01 at 07:26 +0200, Knut Omang wrote:
> > > On Tue, 2014-09-30 at 21:38 +0800, Gonglei wrote:
> > > > > Subject: Re: [Qemu-devel] [PATCH v4 2/3] pcie: add check for ari
> > > > > capability of
> > > > > pcie devices
> > > > >
> > > > > On Tue, Sep 30, 2014 at 06:11:25PM +0800, address@hidden wrote:
> > > > > > From: Gonglei <address@hidden>
> > > > > >
> > > > > > In QEMU, ARI Forwarding is enabled default at emulation of PCIe
> > > > > > ports. ARI Forwarding enable setting at firmware/OS Control handoff.
> > > > > > If the bit is Set when a non-ARI Device is present, the non-ARI
> > > > > > Device can respond to Configuration Space accesses under what it
> > > > > > interprets as being different Device Numbers, and its Functions can
> > > > > > be aliased under multiple Device Numbers, generally leading to
> > > > > > undesired behavior.
> > > > >
> > > > > So what is the undesired behaviour?
> > > > > Did you observe such?
> > > >
> > > > I just observe the PCI device don't work, and the dmesg show me:
> > > >
> > > > [ 159.035250] pciehp 0000:05:00.0:pcie24: Button pressed on Slot (0 - 4)
> > > > [ 159.035274] pciehp 0000:05:00.0:pcie24: Card present on Slot (0 - 4)
> > > > [ 159.036517] pciehp 0000:05:00.0:pcie24: PCI slot #0 - 4 - powering on
> > > > due to button press.
> > > > [ 159.188049] pciehp 0000:05:00.0:pcie24: Failed to check link status
> > > > [ 159.201968] pciehp 0000:05:00.0:pcie24: Card not present on Slot (0 -
> > > > 4)
> > > > [ 159.202529] pciehp 0000:05:00.0:pcie24: Already disabled on Slot (0 -
> > > > 4)
> > >
> > > This seems very much like the symptoms I see when I use hotplug after
> > > the latest changes to the hotplug code - do you have something that
> > > confirms this has something to do with wrong interpretation of device
> > > IDs? My suspicion is that something has gone wrong or is missing in the
> > > hotplug logic but I havent had time to dig deeper into it yet.
> > Can you please describe me the steps to reproduce the issue?
>
> Hmm, while trying to reproduce again I realize my hotplug issues are not
> the same as the one Gonglei reports, let me come back to that in a
> separate mail later,
>
> My main point here is that I don't see how this particular fix would
> alleviate Gonglei's issue, as it does not seem to get triggered unless
> there's a bug in the emulated port/switch?
>
> Gonglei, I assume you are use the TI x3130 in your example:
> IMHO any PCIe x 1 device should fit into any of the (3) PCIe slots
> provided by the TI x3130, and according to the spec:
>
> http://www.ti.com/lit/gpn/xio3130
>
> these slots appear as separate buses, which means that all devices will
> have devfn 0.0 but on different buses, eg. if you have all three switch
> slots (not to be confused with the slot number given by the PCI_SLOT()
> macro) populated and no other secondary buses before it, you should see
> the downstream switch ports as 01:xx:x and devices in 02:00.0, 03:00.0
> and 04:00.0 for slots 0, 1 and 2. This seems to correspond well with the
> current Qemu model.
>
> So unless your device itself exposes function# > 8 and is not ARI
> capable (which would be a non-compliant device as far as I read the
> standard) you should never be able to see any device in any of the
> downstream ports have PCI_SLOT(devfn) != 0 ?
>
> With qemu master + my SR/IOV patch set + the igb patch (just to have an
> ARI capable device to play with) here:
>
> https://github.com/knuto/qemu, branch sriov_patches_v3
>
> and a guest running F20 I am able to boot with a device (such as for
> instance an e1000) inserted in a root port, I am also able to hot plug
> it from the qemu monitor, eg.
>
> qemu-kvm ... \
> -device ioh3420,slot=0,id=pcie_port.0
>
> ...
>
> (qemu) device_add e1000,vlan=1,bus=pcie_port.0,id=ne
> (qemu) device_del ne
>
> In both cases the ARIfwd bit is disabled by default and not enabled by
> the port driver (just as I would expect) so at least your comment
> is wrong as far as I can see.
>
> Booting with a two-port downstream switch with no devices plugged, I see
> the same, ARIfwd is not enabled on the downstream ports as long as no
> device is in any slots, which is as expected, isn't it?
>
> qemu-kvm ... \
> -device x3130-upstream,id=upsw \
> -device xio3130-downstream,bus=upsw,addr=0.0,chassis=5,id=ds_port.0 \
> -device xio3130-downstream,bus=upsw,addr=1.0,chassis=6,id=ds_port.1
> ...
>
> The upstream port does not support ARIfwd in the QEMU model, which I
> suppose is correct as it will only contain individual downstream
> devices, which expose new buses but never have more than one function.
>
> However, either booting or hotplugging an e1000 into the downstream
> switch, eg.
>
> (qemu) device_add e1000,vlan=1,bus=ds_port.0,id=ne
>
> with my current guest image kernel 3.13.10-200.fc20.x86_64+debug I get a
> NULL pointer Oops in the guest,
>
> [ 0.520009] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000038
> [ 0.521000] IP: [<ffffffff813d942d>] pcie_aspm_init_link_state+0x6ed/0x7c0
> [ 0.521000] PGD 0
> [ 0.521000] Oops: 0000 [#1] SMP
here is the gdb version of the stack trace:
0xffffffff813d9431 in alloc_pcie_link_state (pdev=0xffff880178cec520) at
drivers/pci/pcie/aspm.c:530
530 parent = pdev->bus->parent->self->link_state;
(gdb) where
#0 0xffffffff813d9431 in alloc_pcie_link_state (pdev=0xffff880178cec520) at
drivers/pci/pcie/aspm.c:530
#1 pcie_aspm_init_link_state (pdev=0xffff880178cec520) at
drivers/pci/pcie/aspm.c:578
#2 0xffffffff813c6bbd in pci_scan_slot (address@hidden, address@hidden)
at drivers/pci/probe.c:1486
#3 0xffffffff813e0d54 in pciehp_configure_device (address@hidden)
at drivers/pci/hotplug/pciehp_pci.c:54
#4 0xffffffff813e05cb in board_added (p_slot=0xffff880173982760) at
drivers/pci/hotplug/pciehp_ctrl.c:223
#5 pciehp_enable_slot (address@hidden) at drivers/pci/hotplug/pciehp_ctrl.c:510
#6 0xffffffff813e0ac0 in pciehp_power_thread (work=<optimized out>) at
drivers/pci/hotplug/pciehp_ctrl.c:308
#7 0xffffffff8109bc31 in process_one_work (address@hidden,
work=0xffff8801731bbc30)
at kernel/workqueue.c:2214
#8 0xffffffff8109c1eb in worker_thread (__worker=0xffff8801730fd728) at
kernel/workqueue.c:2340
#9 0xffffffff810a43ef in kthread (_create=0xffff88017336edc8) at
kernel/kthread.c:207
#10 <signal handler called>
#11 0x0000000000000000 in irq_stack_union ()
#12 0x0000000000000000 in ?? ()
(gdb) p pdev->bus->parent->self
$2 = (struct pci_dev *) 0x0 <irq_stack_union>
(gdb)
I observe the same behaviour with guest kernel 3.16.3-200.fc20.x86_64.
Knut
> The exact same happens if I use the ARI capable igb device instead.
>
> I will upgrade the guest to a newer image (what kernel are you running with,
> Gonglei?)
>
> lspci reports this on the guest (before I try to add any device in the
> downstream port(s)):
>
> 00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM
> Controller
> 00:01.0 VGA compatible controller: Cirrus Logic GD 5446
> 00:02.0 Ethernet controller: Red Hat, Inc Virtio network device
> 00:03.0 Unclassified device [0002]: Red Hat, Inc Virtio filesystem
> 00:04.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express
> Root Port 0 (rev 02)
> 00:05.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Upstream)
> (rev 02)
> 00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller
> (rev 02)
> 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port
> SATA Controller [AHCI mode] (rev 02)
> 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev
> 02)
> 02:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream)
> (rev 01)
> 02:01.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream)
> (rev 01)
>
> # lspci -t
> -[0000:00]-+-00.0
> +-01.0
> +-02.0
> +-03.0
> +-04.0-[01]--
> +-05.0-[02-04]--+-00.0-[03]--
> | \-01.0-[04]--
> +-1f.0
> +-1f.2
> \-1f.3
>
> eg. the devices plugged into the downstream ports will end up at 03:00.0 and
> 04:00.0,
> the info qtree command verifies that the e1000 insert ends up in 03:00.0 as
> expected.
>
> Similar if I instead use
>
> (qemu) device_add e1000,vlan=1,bus=ds_port.1,id=ne2
>
> I see this with info qtree:
>
> class Ethernet controller, addr 04:00.0, pci id 8086:100e (sub 1af4:1100)
>
> Anyway I suppose this indicates that something is not right/is missing
> with the downstream switch emulation, and not with the (generic) way the
> ARIfwd cap/ctrl bit works
>
> > It should work correctly by now, maybe a "surprise removal/addition"
> > warning and nothing more.
> >
> > Thank you,
> > Marcel
> >
> > >
> > > I am just concerned that this might alleviate the symptoms you see but
> > > not fix the problem itself,
> > >
> > > Knut
> > >
> > > > > Please make this explicit in the commit log.
> > > > >
> > > > Sorry, I copied the description from PCIe spec. :(
> > > >
> > > > IMPLEMENTATION NOTE at Page 19:
> > > >
> > > > https://www.pcisig.com/specifications/pciexpress/specifications/ECN-alt-rid-interpretation-070604.pdf
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > So, for PCI devices attached in PCIe root ports or downstream pots,
> > > > > > we should assure that its slot is not non-zero.
>
>
>
> > > > > > For PCIe devices, which
> > > > > > ARP capability is not enabled, we also should assure that its slot
> > > > > > is not non-zero.
> > > > >
> > > > > not non zero => zero
> > > > >
> > > > OK.
> > > >
> > > > > >
> > > > > > Signed-off-by: Gonglei <address@hidden>
> > > > > > ---
> > > > > > hw/pci/pci.c | 4 ++++
> > > > > > hw/pci/pcie.c | 51
> > > > > +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > > include/hw/pci/pcie.h | 1 +
> > > > > > 3 files changed, 56 insertions(+)
> > > > > >
> > > > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > > > > index 6ce75aa..22b7ca0 100644
> > > > > > --- a/hw/pci/pci.c
> > > > > > +++ b/hw/pci/pci.c
> > > > > > @@ -1770,6 +1770,10 @@ static int pci_qdev_init(DeviceState *qdev)
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > + if (pcie_cap_slot_check(bus, pci_dev)) {
> > > > > > + return -1;
> > > > > > + }
> > > > > > +
> > > > > > /* rom loading */
> > > > > > is_default_rom = false;
> > > > > > if (pci_dev->romfile == NULL && pc->romfile != NULL) {
> > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > > > index 1babddf..b82211a 100644
> > > > > > --- a/hw/pci/pcie.c
> > > > > > +++ b/hw/pci/pcie.c
> > > > > > @@ -633,3 +633,54 @@ void pcie_ari_init(PCIDevice *dev, uint16_t
> > > > > > offset,
> > > > > uint16_t nextfn)
> > > > > > offset, PCI_ARI_SIZEOF);
> > > > > > pci_set_long(dev->config + offset + PCI_ARI_CAP, (nextfn &
> > > > > > 0xff) << 8);
> > > > > > }
> > > > > > +
> > > > > > +int pcie_cap_slot_check(PCIBus *bus, PCIDevice *dev)
> > > > > > +{
> > > > > > + Object *obj = OBJECT(bus);
> > > > > > +
> > > > > > + if (pci_bus_is_root(bus)) {
> > > > > > + return 0;
> > > > > > + }
> > > > > > +
> > > > > > + if (object_dynamic_cast(obj, TYPE_PCIE_BUS)) {
> > > > > > + DeviceState *parent = qbus_get_parent(BUS(obj));
> > > > > > + PCIDevice *pci_dev = PCI_DEVICE(parent);
> > > > > > + uint8_t port_type;
> > > > > > + /*
> > > > > > + * Root ports and downstream ports of switches are the hot
> > > > > > + * pluggable ports in a PCI Express hierarchy.
> > > > > > + * PCI Express supports chip-to-chip interconnect, a PCIe
> > > > > > link can
> > > > > > + * only connect one PCI device/Switch/EndPoint or
> > > > > > PCI-bridge.
> > > > > > + *
> > > > > > + * 7.3. Configuration Transaction Rules (PCI Express
> > > > > > specification
> > > > > 3.0)
> > > > > > + * 7.3.1. Device Number
> > > > > > + *
> > > > > > + * Downstream Ports that do not have ARI Forwarding enabled
> > > > > must
> > > > > > + * associate only Device 0 with the device attached to the
> > > > > > Logical
> > > > > Bus
> > > > > > + * representing the Link from the Port.
> > > > > > + *
> > > > > > + * In QEMU, ARI Forwarding is enabled default at emulation
> > > > > > of
> > > > > PCIe
> > > > >
> > > > > s/enabled default/enabled by default/
>
> It is not when I try this (as discussed above)
> The capability is there but it is disabled by default.
>
> > > > >
> > > > OK.
> > > >
> > > > > > + * ports. ARI Forwarding enable setting at firmware/OS
> > > > > > Control
> > > > > handoff.
> > > > >
> > > > >
> > > > > can parse last sentence. drop it?
> > > > >
> > > > OK.
> > > >
> > > > > > + * If the bit is Set when a non-ARI Device is present, the
> > > > > > non-ARI
> > > > > > + Device can respond to Configuration Space accesses under
> > > > > what it
> > > > > > + * interprets as being different Device Numbers, and its
> > > > > > Functions
> > > > > can
> > > > > > + * be aliased under multiple Device Numbers, generally
> > > > > > leading to
> > > > > > + * undesired behavior.
>
> I don't understand how this is possible to achieve.
>
> Knut
>
> > > > > I don't think any badness really happens.
> > > > > Did you observe any?
> > > > >
> > > > Same with above explanation.
> > > >
> > > > >
> > > > >
> > > > > > + */
> > > > > > + port_type = pcie_cap_get_type(pci_dev);
> > > > > > + if (port_type == PCI_EXP_TYPE_DOWNSTREAM ||
> > > > > > + port_type == PCI_EXP_TYPE_ROOT_PORT) {
> > > > > > + if (!pci_is_express(dev) ||
> > > > > > + (pci_is_express(dev) &&
> > > > > > + !pcie_find_capability(dev, PCI_EXT_CAP_ID_ARI))) {
> > > > >
> > > > > I would just skip the test for a non express function.
> > > > >
> > > > Sorry?
> > > >
> > > > Best regards,
> > > > -Gonglei
> > > >
> > > > > > + if (PCI_SLOT(dev->devfn) != 0) {
> > > > > > + error_report("PCIe: non-ARI device can't be
> > > > > populated"
> > > > > > + " in slot %d",
> > > > > PCI_SLOT(dev->devfn));
> > > > > > + return -1;
> > > > > > + }
> > > > > > + }
> > > > > > + }
> > > > > > + }
> > > > > > +
> > > > > > + return 0;
> > > > > > +}
> > > > > > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> > > > > > index d139d58..1d8f3f4 100644
> > > > > > --- a/include/hw/pci/pcie.h
> > > > > > +++ b/include/hw/pci/pcie.h
> > > > > > @@ -115,6 +115,7 @@ void pcie_add_capability(PCIDevice *dev,
> > > > > > uint16_t offset, uint16_t size);
> > > > > >
> > > > > > void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t
> > > > > > nextfn);
> > > > > > +int pcie_cap_slot_check(PCIBus *bus, PCIDevice *dev);
> > > > > >
> > > > > > extern const VMStateDescription vmstate_pcie_device;
> > > > > >
> > > > > > --
> > > > > > 1.7.12.4
> > > > > >
> > > >
> > > >
> > >
> > >
> > >
> >
> >
> >
>