[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Crashes with qemu-system-ppc64
Re: Crashes with qemu-system-ppc64
Tue, 23 Mar 2021 22:57:56 +0000
On Tue, 23 Mar 2021 at 21:21, Mark Cave-Ayland
> I'm not sure what the right solution is here. In my mind there hasn't really
> been any
> difference between TYPE_DEVICE and TYPE_SYS_BUS_DEVICE other than the APIs
> expose the memory regions and IRQs are different, but clearly platform bus
> expects/defines a different behaviour here.
> Probably the quickest solution for now would be to change the DBDMA device so
> that it
> is derived from TYPE_DEVICE rather than TYPE_SYS_BUS_DEVICE and make the
> changes if everyone agrees?
You want to be at least cautious about doing that. TYPE_DEVICE objects
don't get reset usually, because they are not part of the qbus hierarchy
(all TYPE_SYS_BUS_DEVICE devices live on the sysbus and get reset when
the sysbus is reset). So you would need the (currently nonexistent)
reset function of the containing macio device to manually reset any
TYPE_DEVICE children it has. (This is one of the areas where reset is
a mess, incidentally: logically speaking if you have a PCI device then
you want its children to all get reset when the PCI device itself is
reset; as it stands that doesn't happen, because its sysbus children
are on the sysbus, and a pci-bus-reset won't touch them.)
I forget how the platform bus stuff is supposed to work, but something
should be arranging that it only happens for a pretty restrictive subset
of devices -- in general it should certainly not be firing for random
sysbus devices that are part of something else.
It's a pretty old commit (from 2018, one of Igor's), but I wonder if
this was broken by commit a3fc8396352e945f9. The original design of
the platform bus was that the "grab unassigned IRQs and MMIO regions"
hook got run as a machine-init-done hook. That meant that by definition
the board code had finished setting up all its sysbus devices, and
anything still unconnected must be (assuming not a bug) something the
user requested via -device to be put on the platform bus. But in
commit a3fc8396352e945f9 we changed this to use the hotplug-handler
callbacks, which happen when the device is realized. So it stopped
being true that we would only find loose MMIOs and IRQs on the user's
sysbus devices and now we're also incorrectly grabbing parts of
devices that are supposed to be being wired up by QEMU code before
that code has a chance to do that wiring.
There must still be something causing this not to happen literally for
every sysbus device, or we'd have noticed a lot earlier. I'm not sure
what's specifically different here, but I think it is that:
(1) we only create the platform bus itself as pretty much the
last thing we do in machine init. This (accidentally?)
means it doesn't get to see most of the sysbus devices in
the board itself
(2) macio-oldworld is a pluggable PCI device which happens to
use a sysbus device as part of its implementation, which
is probably not very common
I think the long term fix is that we either need to undo
a3fc8396352e945f9 so that we only run after all other device
creation code has run and the unassigned IRQs and MMIOs really
are just the loose ends, or alternatively we need to make the
hooks much more restrictive about identifying what devices are
supposed to go into the platform bus.
Second note: does it actually make sense for a user to create
a macio-oldworld device and plug it into anything? It's a PCI
device, but is it really a generally usable device rather than
a specific built-into-the-board part of the g3beige machine ?
If it isn't actually useful for a user to create it on the command
line with -device, we could sidestep the whole thing for 6.0 by
marking it dc->user_creatable = false ...