qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole


From: Laszlo Ersek
Subject: Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base
Date: Thu, 27 Sep 2018 00:31:07 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 09/26/18 18:26, Alex Williamson wrote:
> On Wed, 26 Sep 2018 13:12:47 +0200
> Laszlo Ersek <address@hidden> wrote:
> 
>> On 09/25/18 22:36, Alex Williamson wrote:
>>> On Tue, 25 Sep 2018 00:13:46 +0200
>>> Laszlo Ersek <address@hidden> wrote:
>>>   
>>>> In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
>>>> hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
>>>> the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
>>>> property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
>>>> 64-bit MMIO aperture to the guest OS for hotplug purposes.
>>>>
>>>> In that commit, we added or modified five functions:
>>>>
>>>> - pc_pci_hole64_start(): shared between i440fx and q35. Provides a default
>>>>   64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
>>>>   the DIMM hotplug area too (if any).
>>>>
>>>> - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
>>>>   board-specific 64-bit base property getters called abstractly by the
>>>>   ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
>>>>   firmware didn't program any 64-bit hole (i.e. if the firmware didn't
>>>>   assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
>>>>   honor the firmware's BAR assignments (i.e., they treat the lowest 64-bit
>>>>   GPA programmed by the firmware as the base address for the aperture).
>>>>
>>>> - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
>>>>   these intended to extend the aperture to our size recommendation,
>>>>   calculated relative to the base of the aperture.
>>>>
>>>> Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
>>>> q35_host_get_pci_hole64_end() currently only extend the aperture relative
>>>> to the default base (pc_pci_hole64_start()), ignoring any programming done
>>>> by the firmware. This means that our size recommendation may not be met.
>>>> Fix it by honoring the firmware's address assignments.
>>>>
>>>> The strange extension sizes were spotted by Alex, in the log of a guest
>>>> kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).
>>>>
>>>> This change only affects DSDT generation, therefore no new compat property
>>>> is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
>>>> 64-bit BARs, the patch makes no difference to SeaBIOS guests.  
>>>
>>> This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
>>> only if it cannot satisfy all the BARs from 32-bit MMMIO, see
>>> src/fw/pciinit.c:pci_bios_map_devices.  Create a VM with several
>>> assigned GPUs and you'll eventually cross that threshold and all 64-bit
>>> BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
>>> devices could do the same.  Thanks,  
>>
>> The effect of this patch is not hard to demonstrate with SeaBIOS+Q35,
>> when using e.g. 5GB of guest RAM and a 4GB ivshmem-plain device.
>>
>> However, using SeaBIOS+i440fx, I can't show the difference. I've been
>> experimenting with various ivshmem devices (even multiple at the same
>> time, with different sizes). The "all or nothing" nature of SeaBIOS's
>> high allocation of the 64-bit BARs, combined with hugepage alignment
>> inside SeaBIOS, combined with the small (2GB) rounding size used in QEMU
>> for i440fx, seem to make it surprisingly difficult to trigger the issue.
>>
>> I figure I should:
>>
>> (1) remove the sentence "the patch makes no difference to SeaBIOS
>> guests" from the commit message,
>>
>> (2) include the DSDT diff on SeaBIOS/q35 in the commit message,
>>
>> (3) remain silent on SeaBIOS/i440fx, in the commit message,
>>
>> (4) append a new patch, for "bios-tables-test", so that the ACPI gen
>> change is validated as part of the test suite, on SeaBIOS/q35.
>>
>> Regarding (4):
>>
>> - is it OK if I add the test only for Q35?
>>
>> - what guest RAM size am I allowed to use in the test suite? In my own
>> SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
>> acceptable for the test suite.
> 
> Seems like you've done due diligence, the plan looks ok to me.
> Regarding the test memory allocation, is it possible and reasonable to
> perhaps create a 256MB shared memory area and re-use it for multiple
> ivshmem devices?  ie. rather than 1, 4GB ivshmem device, use 16, 256MB
> devices, all with the same backing.  Thanks,

In order to show that the patch makes a difference, on SeaBIOS+Q35, I
must have SeaBIOS place the lowest 64-bit BAR strictly above end-of-RAM
/ end-of-DIMM-hotplug-area (i.e., strictly past the return value of
pc_pci_hole64_start()). That's not easy, assuming test suite constraints:

- With a small guest RAM size (and no DIMM hotplug area),
pc_pci_hole64_start() returns 4GB. Unfortunately, 4GB is well aligned
for 256MB BARs, so that's where SeaBIOS will place the lowest such BAR
too. Therefore the patch makes no difference.

- If I try to mis-align pc_pci_hole64_start() for the 256MB BAR size, by
adding a DIMM hotplug area, it's not helpful. Because the end of the
DIMM area ("etc/reserved-memory-end") is rounded up to 1GB alignment
automatically. Similarly, pc_pci_hole64_start() is rounded up to 1GB
alignment automatically. Thus, the BARs remain aligned.

- If I try to mis-align pc_pci_hole64_start() for the 256MB BAR size by
using more guest RAM (e.g. 2049 MB --> the low RAM split is at 2GB on
Q35, so RAM would end at 4GB+1MB), then it's the same as the previous
bullet, because pc_pci_hole64_start() is rounded up to 1GB alignment,
and the BARs remain aligned.

- If I try to mis-align pc_pci_hole64_start() (again, 4GB, with small
guest RAM) for the ivshmem BAR size by changing the BAR size, then I
must pick a 8GB BAR size at least. And then I can't use the 256MB
"fragmentation".

- I guess I could ascertain the mis-alignment by using small guest RAM
(128MB), a single DIMM hotplug slot so that reserved-memory-end is
rounded up to 5GB (through the 1GB alignment), and a single ivshmem
device with a 2GB BAR (mis-aligned with the reserved-memory-end at 5GB).
Is this the best compromise?

(The last option is basically an optimization of my previous reproducer,
namely 5GB guest RAM and 4GB BAR. The 5GB guest RAM made sure that RAM
would end at an *odd* GB number -- we can optimize that by using an
empty DIMM hotplug area rather than guest RAM. And the 4GB BAR can be
optimized to the smallest *even* (positive) number of GBs, namely 2GB.
But we can't do anything about the pervasive GB-alignment.)

Thanks
Laszlo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]