qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole


From: Igor Mammedov
Subject: Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base
Date: Thu, 27 Sep 2018 09:03:15 +0200

On Thu, 27 Sep 2018 00:31:07 +0200
Laszlo Ersek <address@hidden> wrote:

> On 09/26/18 18:26, Alex Williamson wrote:
> > On Wed, 26 Sep 2018 13:12:47 +0200
> > Laszlo Ersek <address@hidden> wrote:
> >   
> >> On 09/25/18 22:36, Alex Williamson wrote:  
> >>> On Tue, 25 Sep 2018 00:13:46 +0200
> >>> Laszlo Ersek <address@hidden> wrote:
> >>>     
> >>>> In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
> >>>> hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
> >>>> the ACPI DSDT that would be at least as large as the new 
> >>>> "pci-hole64-size"
> >>>> property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
> >>>> 64-bit MMIO aperture to the guest OS for hotplug purposes.
> >>>>
> >>>> In that commit, we added or modified five functions:
> >>>>
> >>>> - pc_pci_hole64_start(): shared between i440fx and q35. Provides a 
> >>>> default
> >>>>   64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
> >>>>   the DIMM hotplug area too (if any).
> >>>>
> >>>> - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
> >>>>   board-specific 64-bit base property getters called abstractly by the
> >>>>   ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
> >>>>   firmware didn't program any 64-bit hole (i.e. if the firmware didn't
> >>>>   assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
> >>>>   honor the firmware's BAR assignments (i.e., they treat the lowest 
> >>>> 64-bit
> >>>>   GPA programmed by the firmware as the base address for the aperture).
> >>>>
> >>>> - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
> >>>>   these intended to extend the aperture to our size recommendation,
> >>>>   calculated relative to the base of the aperture.
> >>>>
> >>>> Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
> >>>> q35_host_get_pci_hole64_end() currently only extend the aperture relative
> >>>> to the default base (pc_pci_hole64_start()), ignoring any programming 
> >>>> done
> >>>> by the firmware. This means that our size recommendation may not be met.
> >>>> Fix it by honoring the firmware's address assignments.
> >>>>
> >>>> The strange extension sizes were spotted by Alex, in the log of a guest
> >>>> kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).
> >>>>
> >>>> This change only affects DSDT generation, therefore no new compat 
> >>>> property
> >>>> is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
> >>>> 64-bit BARs, the patch makes no difference to SeaBIOS guests.    
> >>>
> >>> This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
> >>> only if it cannot satisfy all the BARs from 32-bit MMMIO, see
> >>> src/fw/pciinit.c:pci_bios_map_devices.  Create a VM with several
> >>> assigned GPUs and you'll eventually cross that threshold and all 64-bit
> >>> BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
> >>> devices could do the same.  Thanks,    
> >>
> >> The effect of this patch is not hard to demonstrate with SeaBIOS+Q35,
> >> when using e.g. 5GB of guest RAM and a 4GB ivshmem-plain device.
> >>
> >> However, using SeaBIOS+i440fx, I can't show the difference. I've been
> >> experimenting with various ivshmem devices (even multiple at the same
> >> time, with different sizes). The "all or nothing" nature of SeaBIOS's
> >> high allocation of the 64-bit BARs, combined with hugepage alignment
> >> inside SeaBIOS, combined with the small (2GB) rounding size used in QEMU
> >> for i440fx, seem to make it surprisingly difficult to trigger the issue.
> >>
> >> I figure I should:
> >>
> >> (1) remove the sentence "the patch makes no difference to SeaBIOS
> >> guests" from the commit message,
> >>
> >> (2) include the DSDT diff on SeaBIOS/q35 in the commit message,
> >>
> >> (3) remain silent on SeaBIOS/i440fx, in the commit message,
> >>
> >> (4) append a new patch, for "bios-tables-test", so that the ACPI gen
> >> change is validated as part of the test suite, on SeaBIOS/q35.
> >>
> >> Regarding (4):
> >>
> >> - is it OK if I add the test only for Q35?
> >>
> >> - what guest RAM size am I allowed to use in the test suite? In my own
> >> SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
> >> acceptable for the test suite.  
> > 
> > Seems like you've done due diligence, the plan looks ok to me.
> > Regarding the test memory allocation, is it possible and reasonable to
> > perhaps create a 256MB shared memory area and re-use it for multiple
> > ivshmem devices?  ie. rather than 1, 4GB ivshmem device, use 16, 256MB
> > devices, all with the same backing.  Thanks,  
> 
> In order to show that the patch makes a difference, on SeaBIOS+Q35, I
> must have SeaBIOS place the lowest 64-bit BAR strictly above end-of-RAM
> / end-of-DIMM-hotplug-area (i.e., strictly past the return value of
> pc_pci_hole64_start()). That's not easy, assuming test suite constraints:
> 
> - With a small guest RAM size (and no DIMM hotplug area),
> pc_pci_hole64_start() returns 4GB. Unfortunately, 4GB is well aligned
> for 256MB BARs, so that's where SeaBIOS will place the lowest such BAR
> too. Therefore the patch makes no difference.
> 
> - If I try to mis-align pc_pci_hole64_start() for the 256MB BAR size, by
> adding a DIMM hotplug area, it's not helpful. Because the end of the
> DIMM area ("etc/reserved-memory-end") is rounded up to 1GB alignment
> automatically. Similarly, pc_pci_hole64_start() is rounded up to 1GB
> alignment automatically. Thus, the BARs remain aligned.
> 
> - If I try to mis-align pc_pci_hole64_start() for the 256MB BAR size by
> using more guest RAM (e.g. 2049 MB --> the low RAM split is at 2GB on
> Q35, so RAM would end at 4GB+1MB), then it's the same as the previous
> bullet, because pc_pci_hole64_start() is rounded up to 1GB alignment,
> and the BARs remain aligned.
> 
> - If I try to mis-align pc_pci_hole64_start() (again, 4GB, with small
> guest RAM) for the ivshmem BAR size by changing the BAR size, then I
> must pick a 8GB BAR size at least. And then I can't use the 256MB
> "fragmentation".
> 
> - I guess I could ascertain the mis-alignment by using small guest RAM
> (128MB), a single DIMM hotplug slot so that reserved-memory-end is
> rounded up to 5GB (through the 1GB alignment), and a single ivshmem
> device with a 2GB BAR (mis-aligned with the reserved-memory-end at 5GB).
> Is this the best compromise?
> 
> (The last option is basically an optimization of my previous reproducer,
> namely 5GB guest RAM and 4GB BAR. The 5GB guest RAM made sure that RAM
> would end at an *odd* GB number -- we can optimize that by using an
> empty DIMM hotplug area rather than guest RAM. And the 4GB BAR can be
> optimized to the smallest *even* (positive) number of GBs, namely 2GB.
that probably would do the job

> But we can't do anything about the pervasive GB-alignment.)
I haven't read the code to be sure but at first glance, thanks to
brokenness before, it might be possible to mis-align pci_hole64 start
if one would use a combination of
 broken_reserved_end=true && enforce_aligned_dimm=false

That's assuming that an old machine type that has this combo is still
suitable for testcase otherwise.

> 
> Thanks
> Laszlo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]