qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] spapr_pci: Robustify support of PCI bridges


From: Markus Armbruster
Subject: Re: [PATCH] spapr_pci: Robustify support of PCI bridges
Date: Thu, 16 Jul 2020 16:01:18 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)

David Gibson <david@gibson.dropbear.id.au> writes:

> On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote:
>> Some recent error handling cleanups unveiled issues with our support of
>> PCI bridges:
>> 
>> 1) QEMU aborts when using non-standard PCI bridge types,
>>    unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling"
>> 
>> $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge
>> Unexpected error in object_property_find() at qom/object.c:1240:
>> qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found
>> Aborted (core dumped)
>
> Oops, I thought we had a check that we actually had a "pci-bridge"
> device before continuing with the hotplug, but I guess not.
>
>> This happens because we assume all PCI bridge types to have a "chassis_nr"
>> property. This property only exists with the standard PCI bridge type
>> "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems
>> much simpler to check the presence of "chassis_nr" earlier.
>
> Hrm, right, 7ef1553dac was not really correct since add_drcs() really
> can fail.

Right.  I failed to see that we can run into a bridge without a
"chassis_nr" here.

>> 2) QEMU abort if same "chassis_nr" value is used several times,
>>    unveiled by commit d2623129a7de "qom: Drop parameter @errp of
>>    object_property_add() & friends"
>> 
>> $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \
>>                         -device pci-bridge,chassis_nr=1
>> Unexpected error in object_property_try_add() at qom/object.c:1167:
>> qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate 
>> property '40000100' to object (type 'container')
>> Aborted (core dumped)

Before d2623129a7de, the error got *ignored* in
spapr_dr_connector_new():

    SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
                                             uint32_t id)
    {
        SpaprDrc *drc = SPAPR_DR_CONNECTOR(object_new(type));
        char *prop_name;

        drc->id = id;
        drc->owner = owner;
        prop_name = g_strdup_printf("dr-connector[%"PRIu32"]",
                                    spapr_drc_index(drc));
        object_property_add_child(owner, prop_name, OBJECT(drc), &error_abort);
        object_unref(OBJECT(drc));
--->    object_property_set_bool(OBJECT(drc), true, "realized", NULL);
        g_free(prop_name);

        return drc;
    }

I doubt that's healthy.

>> This happens because we assume that "chassis_nr" values are unique, but
>> nobody enforces that and we end up generating duplicate DRC ids. The PCI
>> code doesn't really care for duplicate "chassis_nr" properties since it
>> is only used to initialize the "Chassis Number Register" of the bridge,
>> with no functional impact on QEMU. So, even if passing the same value
>> several times might look weird, it never broke anything before, so
>> I guess we don't necessarily want to enforce strict checking in the PCI
>> code now.
>
> Yeah, I guess.  I'm pretty sure that the chassis number of bridges is
> supposed to be system-unique (well, unique within the PCI domain at
> least, I guess) as part of the hardware spec.  So specifying multiple
> chassis ids the same is a user error, but we need a better failure
> mode.
>
>> Workaround both issues in the PAPR code: check that the bridge has a
>> unique and non null "chassis_nr" when plugging it into its parent bus.
>>
>> Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids")
>
> Arguably, it's really fixing 7ef1553dac.

I agree 7ef1553dac broke the "use a bridge that doesn't have property
'chassis_nr' case.

I suspect the "duplicate chassis_nr" case has always been broken, and
d2623129a7de merely uncovered it.

If we can trigger the abort with hot-plug, then d2623129a7de made things
materially worse (new way to accidentally kill your guest and maybe lose
data), and I'd add a Fixes: blaming it.

>> Reported-by: Thomas Huth <thuth@redhat.com>
>> Signed-off-by: Greg Kurz <groug@kaod.org>
>
> I had a few misgivings about the details of this, but I think I've
> convinced myself they're fine.  There's a couple of things I'd like to
> polish, but I'll do that as a follow up.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]