qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole


From: Laszlo Ersek
Subject: Re: [Qemu-devel] [PATCH 2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base
Date: Thu, 27 Sep 2018 11:11:46 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 09/27/18 09:03, Igor Mammedov wrote:
> On Thu, 27 Sep 2018 00:31:07 +0200
> Laszlo Ersek <address@hidden> wrote:
> 
>> On 09/26/18 18:26, Alex Williamson wrote:
>>> On Wed, 26 Sep 2018 13:12:47 +0200
>>> Laszlo Ersek <address@hidden> wrote:
>>>   
>>>> On 09/25/18 22:36, Alex Williamson wrote:  
>>>>> On Tue, 25 Sep 2018 00:13:46 +0200
>>>>> Laszlo Ersek <address@hidden> wrote:
>>>>>     
>>>>>> In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
>>>>>> hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
>>>>>> the ACPI DSDT that would be at least as large as the new 
>>>>>> "pci-hole64-size"
>>>>>> property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
>>>>>> 64-bit MMIO aperture to the guest OS for hotplug purposes.
>>>>>>
>>>>>> In that commit, we added or modified five functions:
>>>>>>
>>>>>> - pc_pci_hole64_start(): shared between i440fx and q35. Provides a 
>>>>>> default
>>>>>>   64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
>>>>>>   the DIMM hotplug area too (if any).
>>>>>>
>>>>>> - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
>>>>>>   board-specific 64-bit base property getters called abstractly by the
>>>>>>   ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
>>>>>>   firmware didn't program any 64-bit hole (i.e. if the firmware didn't
>>>>>>   assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
>>>>>>   honor the firmware's BAR assignments (i.e., they treat the lowest 
>>>>>> 64-bit
>>>>>>   GPA programmed by the firmware as the base address for the aperture).
>>>>>>
>>>>>> - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
>>>>>>   these intended to extend the aperture to our size recommendation,
>>>>>>   calculated relative to the base of the aperture.
>>>>>>
>>>>>> Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
>>>>>> q35_host_get_pci_hole64_end() currently only extend the aperture relative
>>>>>> to the default base (pc_pci_hole64_start()), ignoring any programming 
>>>>>> done
>>>>>> by the firmware. This means that our size recommendation may not be met.
>>>>>> Fix it by honoring the firmware's address assignments.
>>>>>>
>>>>>> The strange extension sizes were spotted by Alex, in the log of a guest
>>>>>> kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).
>>>>>>
>>>>>> This change only affects DSDT generation, therefore no new compat 
>>>>>> property
>>>>>> is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
>>>>>> 64-bit BARs, the patch makes no difference to SeaBIOS guests.    
>>>>>
>>>>> This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
>>>>> only if it cannot satisfy all the BARs from 32-bit MMMIO, see
>>>>> src/fw/pciinit.c:pci_bios_map_devices.  Create a VM with several
>>>>> assigned GPUs and you'll eventually cross that threshold and all 64-bit
>>>>> BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
>>>>> devices could do the same.  Thanks,    
>>>>
>>>> The effect of this patch is not hard to demonstrate with SeaBIOS+Q35,
>>>> when using e.g. 5GB of guest RAM and a 4GB ivshmem-plain device.
>>>>
>>>> However, using SeaBIOS+i440fx, I can't show the difference. I've been
>>>> experimenting with various ivshmem devices (even multiple at the same
>>>> time, with different sizes). The "all or nothing" nature of SeaBIOS's
>>>> high allocation of the 64-bit BARs, combined with hugepage alignment
>>>> inside SeaBIOS, combined with the small (2GB) rounding size used in QEMU
>>>> for i440fx, seem to make it surprisingly difficult to trigger the issue.
>>>>
>>>> I figure I should:
>>>>
>>>> (1) remove the sentence "the patch makes no difference to SeaBIOS
>>>> guests" from the commit message,
>>>>
>>>> (2) include the DSDT diff on SeaBIOS/q35 in the commit message,
>>>>
>>>> (3) remain silent on SeaBIOS/i440fx, in the commit message,
>>>>
>>>> (4) append a new patch, for "bios-tables-test", so that the ACPI gen
>>>> change is validated as part of the test suite, on SeaBIOS/q35.
>>>>
>>>> Regarding (4):
>>>>
>>>> - is it OK if I add the test only for Q35?
>>>>
>>>> - what guest RAM size am I allowed to use in the test suite? In my own
>>>> SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
>>>> acceptable for the test suite.  
>>>
>>> Seems like you've done due diligence, the plan looks ok to me.
>>> Regarding the test memory allocation, is it possible and reasonable to
>>> perhaps create a 256MB shared memory area and re-use it for multiple
>>> ivshmem devices?  ie. rather than 1, 4GB ivshmem device, use 16, 256MB
>>> devices, all with the same backing.  Thanks,  
>>
>> In order to show that the patch makes a difference, on SeaBIOS+Q35, I
>> must have SeaBIOS place the lowest 64-bit BAR strictly above end-of-RAM
>> / end-of-DIMM-hotplug-area (i.e., strictly past the return value of
>> pc_pci_hole64_start()). That's not easy, assuming test suite constraints:
>>
>> - With a small guest RAM size (and no DIMM hotplug area),
>> pc_pci_hole64_start() returns 4GB. Unfortunately, 4GB is well aligned
>> for 256MB BARs, so that's where SeaBIOS will place the lowest such BAR
>> too. Therefore the patch makes no difference.
>>
>> - If I try to mis-align pc_pci_hole64_start() for the 256MB BAR size, by
>> adding a DIMM hotplug area, it's not helpful. Because the end of the
>> DIMM area ("etc/reserved-memory-end") is rounded up to 1GB alignment
>> automatically. Similarly, pc_pci_hole64_start() is rounded up to 1GB
>> alignment automatically. Thus, the BARs remain aligned.
>>
>> - If I try to mis-align pc_pci_hole64_start() for the 256MB BAR size by
>> using more guest RAM (e.g. 2049 MB --> the low RAM split is at 2GB on
>> Q35, so RAM would end at 4GB+1MB), then it's the same as the previous
>> bullet, because pc_pci_hole64_start() is rounded up to 1GB alignment,
>> and the BARs remain aligned.
>>
>> - If I try to mis-align pc_pci_hole64_start() (again, 4GB, with small
>> guest RAM) for the ivshmem BAR size by changing the BAR size, then I
>> must pick a 8GB BAR size at least. And then I can't use the 256MB
>> "fragmentation".
>>
>> - I guess I could ascertain the mis-alignment by using small guest RAM
>> (128MB), a single DIMM hotplug slot so that reserved-memory-end is
>> rounded up to 5GB (through the 1GB alignment), and a single ivshmem
>> device with a 2GB BAR (mis-aligned with the reserved-memory-end at 5GB).
>> Is this the best compromise?
>>
>> (The last option is basically an optimization of my previous reproducer,
>> namely 5GB guest RAM and 4GB BAR. The 5GB guest RAM made sure that RAM
>> would end at an *odd* GB number -- we can optimize that by using an
>> empty DIMM hotplug area rather than guest RAM. And the 4GB BAR can be
>> optimized to the smallest *even* (positive) number of GBs, namely 2GB.
> that probably would do the job

OK, thanks.

>> But we can't do anything about the pervasive GB-alignment.)
> I haven't read the code to be sure but at first glance, thanks to
> brokenness before, it might be possible to mis-align pci_hole64 start
> if one would use a combination of
>  broken_reserved_end=true && enforce_aligned_dimm=false
> 
> That's assuming that an old machine type that has this combo is still
> suitable for testcase otherwise.

I thought of messing with manual compat property overrides (or old
machine types), but I'd prefer not to go there. The code I'm patching
already depends on a compat property ("x-pci-hole64-fix", for both
"i440FX-pcihost" and "q35-pcihost"), and that one is set to "off" up to
and including PC_COMPAT_2_10. Whereas the latest machine types (i440fx
and q35) where "broken_reserved_end" is true are 2.4. Which leaves me
with an empty set.

I'll start work on the "best compromise" method.

Thanks!
Laszlo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]