qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: qemu-system-ppc64 abort()s with pcie bridges


From: Greg Kurz
Subject: Re: qemu-system-ppc64 abort()s with pcie bridges
Date: Thu, 9 Jul 2020 11:28:01 +0200

On Wed, 8 Jul 2020 11:57:03 +0200
Greg Kurz <groug@kaod.org> wrote:

> On Wed, 8 Jul 2020 10:03:47 +0200
> Thomas Huth <thuth@redhat.com> wrote:
> 
> > 
> >  Hi,
> > 
> > qemu-system-ppc64 currently abort()s when it is started with a pcie
> > bridge device:
> > 
> > $ qemu-system-ppc64 -M pseries-5.1 -device pcie-pci-bridge
> > Unexpected error in object_property_find() at qom/object.c:1240:
> > qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found
> > Aborted (core dumped)
> > 
> > or:
> > 
> > $ qemu-system-ppc64 -M pseries -device dec-21154-p2p-bridge
> > Unexpected error in object_property_find() at qom/object.c:1240:
> > qemu-system-ppc64: -device dec-21154-p2p-bridge: Property '.chassis_nr'
> > not found
> > Aborted (core dumped)
> > 
> > That's kind of ugly, and it shows up as error when running
> > scripts/device-crash-test. Is there an easy way to avoid the abort() and
> > fail more gracefully here?
> > 
> 
> And even worse, this can tear down a running guest with hotplug :\
> 
> (qemu) device_add pcie-pci-bridge 
> Unexpected error in object_property_find() at 
> /home/greg/Work/qemu/qemu-ppc/qom/object.c:1240:
> Property '.chassis_nr' not found
> Aborted (core dumped)
> 
> This is caused by recent commit:
> 
> commit 7ef1553dac8ef8dbe547b58d7420461a16be0eeb
> Author: Markus Armbruster <armbru@redhat.com>
> Date:   Tue May 5 17:29:25 2020 +0200
> 
>     spapr_pci: Drop some dead error handling
>     
>     chassis_from_bus() uses object_property_get_uint() to get property
>     "chassis_nr" of the bridge device.  Failure would be a programming
>     error.  Pass &error_abort, and simplify its callers.
>     
>     Cc: David Gibson <david@gibson.dropbear.id.au>
>     Cc: qemu-ppc@nongnu.org
>     Signed-off-by: Markus Armbruster <armbru@redhat.com>
>     Acked-by: David Gibson <david@gibson.dropbear.id.au>
>     Reviewed-by: Greg Kurz <groug@kaod.org>
>     Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
>     Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>     Message-Id: <20200505152926.18877-18-armbru@redhat.com>
> 
> Before that, we would simply print the "chassir_nr not found" error,
> and in case of a cold plugged device exit.
> 
> The root cause is that the sPAPR PCI code assumes that a PCI bridge
> has a "chassir_nr" property, ie. it is a standard PCI bridge. Other
> PCI bridge types don't have that. Not sure yet why this information
> is required, I'll check LoPAPR.
> 

More on this side : each slot of a PCI bridge is associated a DRC (a
PAPR thingy to handle hot plug/unplug). Each DRC must have a unique
identifier system-wide. We used to use the bus number to compute
the DRC id but it was broken, so we now _hijack_ "chassis_nr" as an
alternative since this commit:

commit 05929a6c5dfe1028ef66250b7bbf11939f8e77cd
Author: David Gibson <david@gibson.dropbear.id.au>
Date:   Wed Apr 10 11:49:28 2019 +1000

    spapr: Don't use bus number for building DRC ids

This means that we only support the standard pci-bridge device,
and this relies on the availability of "chassis_nr". Failure
to find this property is then not a programming error, but
an expected case where we want to fail gracefully (ie. revert
Markus's commit mentioned above).

While reading code I realized that we have another problem : the
realization of the pci-bridge device does fail if "chassis_nr" is
zero, but I failed to find a uniqueness check. And we get:

$ qemu-system-ppc64 -device pci-bridge,chassis_nr=1 -device 
pci-bridge,chassis_nr=1
Unexpected error in object_property_try_add() at qom/object.c:1167:
qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate 
property '40000100' to object (type 'container')
Aborted (core dumped)

It is very confusing to see that we state that "chassis_nr" is unique
several times in slotid_cap_init() but it is never enforced anywhere.

    if (!chassis) {
        error_setg(errp, "Bridge chassis not specified. Each bridge is required"
                   " to be assigned a unique chassis id > 0.");
        return -EINVAL;
    }

or

    /* We make each chassis unique, this way each bridge is First in Chassis */


Michael, Marcel or anyone with PCI knowledge,

Can you shed some light on the semantics of "chassis_nr" ?

> In the meantime, since we're in soft freeze, I guess we should
> revert Markus's patch and add a big fat comment to explain
> what's going on and maybe change the error message to something
> more informative, eg. "PCIE-to-PCI bridges are not supported".
> 
> Thoughts ?
> 
> >  Thomas
> > 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]