[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Crashes with qemu-system-ppc64

From: Mark Cave-Ayland
Subject: Re: Crashes with qemu-system-ppc64
Date: Thu, 25 Mar 2021 12:57:48 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0

On 23/03/2021 22:57, Peter Maydell wrote:

On Tue, 23 Mar 2021 at 21:21, Mark Cave-Ayland
<mark.cave-ayland@ilande.co.uk> wrote:
I'm not sure what the right solution is here. In my mind there hasn't really 
been any
difference between TYPE_DEVICE and TYPE_SYS_BUS_DEVICE other than the APIs that
expose the memory regions and IRQs are different, but clearly platform bus
expects/defines a different behaviour here.

Probably the quickest solution for now would be to change the DBDMA device so 
that it
is derived from TYPE_DEVICE rather than TYPE_SYS_BUS_DEVICE and make the 
changes if everyone agrees?

You want to be at least cautious about doing that. TYPE_DEVICE objects
don't get reset usually, because they are not part of the qbus hierarchy
(all TYPE_SYS_BUS_DEVICE devices live on the sysbus and get reset when
the sysbus is reset). So you would need the (currently nonexistent)
reset function of the containing macio device to manually reset any
TYPE_DEVICE children it has. (This is one of the areas where reset is
a mess, incidentally: logically speaking if you have a PCI device then
you want its children to all get reset when the PCI device itself is
reset; as it stands that doesn't happen, because its sysbus children
are on the sysbus, and a pci-bus-reset won't touch them.)

I forget how the platform bus stuff is supposed to work, but something
should be arranging that it only happens for a pretty restrictive subset
of devices -- in general it should certainly not be firing for random
sysbus devices that are part of something else.

It's a pretty old commit (from 2018, one of Igor's), but I wonder if
this was broken by commit a3fc8396352e945f9. The original design of
the platform bus was that the "grab unassigned IRQs and MMIO regions"
hook got run as a machine-init-done hook. That meant that by definition
the board code had finished setting up all its sysbus devices, and
anything still unconnected must be (assuming not a bug) something the
user requested via -device to be put on the platform bus. But in
commit a3fc8396352e945f9 we changed this to use the hotplug-handler
callbacks, which happen when the device is realized. So it stopped
being true that we would only find loose MMIOs and IRQs on the user's
sysbus devices and now we're also incorrectly grabbing parts of
devices that are supposed to be being wired up by QEMU code before
that code has a chance to do that wiring.

There must still be something causing this not to happen literally for
every sysbus device, or we'd have noticed a lot earlier. I'm not sure
what's specifically different here, but I think it is that:
  (1) we only create the platform bus itself as pretty much the
      last thing we do in machine init. This (accidentally?)
      means it doesn't get to see most of the sysbus devices in
      the board itself
  (2) macio-oldworld is a pluggable PCI device which happens to
      use a sysbus device as part of its implementation, which
      is probably not very common

I think the long term fix is that we either need to undo
a3fc8396352e945f9 so that we only run after all other device
creation code has run and the unassigned IRQs and MMIOs really
are just the loose ends, or alternatively we need to make the
hooks much more restrictive about identifying what devices are
supposed to go into the platform bus.

Thanks for the analysis: I can certainly see how the above commit would have changed the behaviour. Looking at hw/ppc/e590plat.c in e500plat_machine_class_init() I see that line 101 reads "machine_class_allow_dynamic_sysbus_dev(mc, TYPE_ETSEC_COMMON);" which looks like it is intended to add a class restriction to this functionality.

In machine_initfn() a callback for machine_init_notify() is added to perform the check but the macio-oldworld device is realized first, because qmp_x_exit_preconfig() calls qemu_create_cli_devices() to realize the devices just before qemu_machine_creation_done() where the notifier is invoked.

Second note: does it actually make sense for a user to create
a macio-oldworld device and plug it into anything? It's a PCI
device, but is it really a generally usable device rather than
a specific built-into-the-board part of the g3beige machine ?
If it isn't actually useful for a user to create it on the command
line with -device, we could sidestep the whole thing for 6.0 by
marking it dc->user_creatable = false ...

I'd prefer not to do that if possible since the macio device is a good example to have as something that passes device-introspect-test whilst containing several different child devices.

Given that the DBDMA device hasn't changed for some time and the above commit dates back to 2018 then I'd be inclined to leave it for now unless it is causing gitlab CI to fail.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]