On 06/07/15 11:23, Michael S. Tsirkin wrote:
On Sat, Jun 06, 2015 at 01:46:29AM +0200, Laszlo Ersek wrote:
OVMF downloads the ACPI linker/loader script from QEMU when the edk2 PCI
Bus driver globally signals the firmware that PCI enumeration and resource
allocation have completed. At this point QEMU regenerates the ACPI payload
in an fw_cfg read callback, and this is when the PXB's _CRS gets
populated.
Unfortunately, when this happens, the PCI_COMMAND_MEMORY bit is clear in
the root bus's command register, *unlike* under SeaBIOS. The consequences
unfold as follows:
- When build_crs() fetches dev->io_regions[i].addr, it is all-bits-one,
because pci_update_mappings() --> pci_bar_address() calculated it as
PCI_BAR_UNMAPPED, due to the PCI_COMMAND_MEMORY bit being clear.
- Consequently, the SHPC MMIO BAR (bar 0) of the bridge is not added to
the _CRS, *despite* having been programmed in PCI config space.
- Similarly, the SHPC MMIO BAR of the PXB is not removed from the main
root bus's DWordMemory descriptor.
- Guest OSes (Linux and Windows alike) notice the pre-programmed SHPC BAR
within the PXB's config space, and notice that it conflicts with the
main root bus's memory resource descriptors. Linux reports
pci 0000:04:00.0: BAR 0: can't assign mem (size 0x100)
pci 0000:04:00.0: BAR 0: trying firmware assignment [mem
0x88200000-0x882000ff 64bit]
pci 0000:04:00.0: BAR 0: [mem 0x88200000-0x882000ff 64bit] conflicts
with PCI Bus 0000:00 [mem
0x88200000-0xfebfffff]
While Windows Server 2012 R2 reports
https://technet.microsoft.com/en-us/library/cc732199%28v=ws.10%29.aspx
This device cannot find enough free resources that it can use. If you
want to use this device, you will need to disable one of the other
devices on this system. (Code 12)
(This issue was apparently encountered earlier, see:
https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
and the current hole-punching logic in build_crs() and build_ssdt() is
probably supposed to remedy exactly that problem -- however, for OVMF they
don't work, because at the end of the PCI enumeration and resource
allocation, which cues the ACPI linker/loader client, the command register
is clear.)
The solution is to fetch the BAR addresses from PCI config space directly,
for the purposes of build_crs(), regardless of the PCI command register
and any MemoryRegion state.
Example MMIO maps for a 2GB guest:
SeaBIOS:
main: 0x80000000..0xFC000000 / 0x7C000000
pxb: 0xFC000000..0xFC200000 / 0x00200000
main: 0xFC200000..0xFC213000 / 0x00013000
pxb: 0xFC213000..0xFC213100 / 0x00000100 <- SHPC BAR
main: 0xFC213100..0xFEA00000 / 0x027ECF00
pxb: 0xFEA00000..0xFEC00000 / 0x00200000
OVMF, without the fix:
main: 0x80000000..0x88100000 / 0x08100000
pxb: 0x88100000..0x88200000 / 0x00100000
main: 0x88200000..0xFEC00000 / 0x76A00000
OVMF, with the fix:
main: 0x80000000..0x88100000 / 0x08100000
pxb: 0x88100000..0x88200000 / 0x00100000
pxb: 0x88200000..0x88200100 / 0x00000100 <- SHPC BAR
main: 0x88200100..0xFEC00000 / 0x769FFF00
(Note that under OVMF the PCI enumerator includes requests for
prefetchable memory in the nonprefetchable memory pool -- separate windows
for nonprefetchable and prefetchable memory are not supported (yet) --
that's why there's one fewer pxb range in the corrected OVMF case than in
the SeaBIOS case.)
Cc: Marcel Apfelbaum <address@hidden>
Cc: Michael S. Tsirkin <address@hidden>
Signed-off-by: Laszlo Ersek <address@hidden>
This is problematic - disabled BAR values have no meaning according to
the PCI spec.
It might be best to add a property to just disable shpc in the bridge so
no devices reside directly behind the pxb?
I started looking into this.
The property itself doesn't seem terribly hard (there's already an "msi"
property which I can take as an example). Making stuff conditional on
this new "shpc" property looked feasible in the beginning, however I
qucikly ran into two issues:
- Migration.
One idea would be to keep the SHPC setup around at all times, and
just not expose the PCI bar to the guest (same as in Marcel's hack
[1]). I didn't like this, although it would keep the migration stream
intact.
The other idea is to omit even the shpc_init() call when SHPC is
disabled. I like this, but it would require breaking migration
compatibility. Both "minimum_version_id" and "version_id" would have
to be set to 1 (from the current zero), and the fixed SHPC_VMSTATE()
field should be replaced with a subsection (dependent on the new
"shpc" flag).
Seems sweaty but doable.
- Hotplug handling.
This is the deal breaker. The new "shpc" flag can affect *instances*
of the bridge, but SHPC is a class-level trait.
pci_bridge_dev_class_init() sets hc->plug and hc->unplug_request to
SHPC-related callbacks, plus "pci_bridge_dev_info" advertises itself
as TYPE_HOTPLUG_HANDLER.
This implies that I'd have to split the TYPE_PCI_BRIDGE_DEV class
into a base class and a derived class. Only the derived class would
support SHPC / TYPE_HOTPLUG_HANDLER. And, PXB would have to be
diverted to the new base class (without SHPC), in pxb_dev_initfn(),
from "pci-bridge". (The derived class would preserve the name
"pci-bridge".)
Consequences for migration are unclear to me. Maybe the new derived
class (functionally equivalent to the current TYPE_PCI_BRIDGE_DEV)
would be migration-compatible with the current class.
If not, I could create the "basic" bridge class as a standalone one,
without interrupt / MSI / SHPC / hotplug support, and PXB would use
that. The file "hw/pci-bridge/pci_bridge_dev.c" is really small, so
this would be quite easy; it woduln't duplicate a lot of code, and
would not affect preexistent migration at all. On the other hand,
people might not like that the base class functionality were
duplicated, instead of inherited.