[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v7 09/10] i386/pc: relocate 4g start to 1T where applicable
|
From: |
Joao Martins |
|
Subject: |
Re: [PATCH v7 09/10] i386/pc: relocate 4g start to 1T where applicable |
|
Date: |
Fri, 15 Jul 2022 13:07:59 +0100 |
On 7/15/22 12:57, Igor Mammedov wrote:
> On Thu, 14 Jul 2022 19:28:19 +0100
> Joao Martins <joao.m.martins@oracle.com> wrote:
>
>> It is assumed that the whole GPA space is available to be DMA
>> addressable, within a given address space limit, except for a
>> tiny region before the 4G. Since Linux v5.4, VFIO validates
>> whether the selected GPA is indeed valid i.e. not reserved by
>> IOMMU on behalf of some specific devices or platform-defined
>> restrictions, and thus failing the ioctl(VFIO_DMA_MAP) with
>> -EINVAL.
>>
>> AMD systems with an IOMMU are examples of such platforms and
>> particularly may only have these ranges as allowed:
>>
>> 0000000000000000 - 00000000fedfffff (0 .. 3.982G)
>> 00000000fef00000 - 000000fcffffffff (3.983G .. 1011.9G)
>> 0000010000000000 - ffffffffffffffff (1Tb .. 16Pb[*])
>>
>> We already account for the 4G hole, albeit if the guest is big
>> enough we will fail to allocate a guest with >1010G due to the
>> ~12G hole at the 1Tb boundary, reserved for HyperTransport (HT).
>>
>> [*] there is another reserved region unrelated to HT that exists
>> in the 256T boundary in Fam 17h according to Errata #1286,
>> documeted also in "Open-Source Register Reference for AMD Family
>> 17h Processors (PUB)"
>>
>> When creating the region above 4G, take into account that on AMD
>> platforms the HyperTransport range is reserved and hence it
>> cannot be used either as GPAs. On those cases rather than
>> establishing the start of ram-above-4g to be 4G, relocate instead
>> to 1Tb. See AMD IOMMU spec, section 2.1.2 "IOMMU Logical
>> Topology", for more information on the underlying restriction of
>> IOVAs.
>>
>> After accounting for the 1Tb hole on AMD hosts, mtree should
>> look like:
>>
>> 0000000000000000-000000007fffffff (prio 0, i/o):
>> alias ram-below-4g @pc.ram 0000000000000000-000000007fffffff
>> 0000010000000000-000001ff7fffffff (prio 0, i/o):
>> alias ram-above-4g @pc.ram 0000000080000000-000000ffffffffff
>>
>> If the relocation is done or the address space covers it, we
>> also add the the reserved HT e820 range as reserved.
>>
>> Default phys-bits on Qemu is TCG_PHYS_ADDR_BITS (40) which is enough
>> to address 1Tb (0xff ffff ffff). On AMD platforms, if a
>> ram-above-4g relocation may be desired and the CPU wasn't configured
>> with a big enough phys-bits, print an error message to the user
>> and do not make the relocation of the above-4g-region if phys-bits
>> is too low.
>>
>> Suggested-by: Igor Mammedov <imammedo@redhat.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> hw/i386/pc.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 82 insertions(+)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index cda435e3baeb..17613974163e 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -880,6 +880,52 @@ static uint64_t pc_get_cxl_range_end(PCMachineState
>> *pcms)
>> return start;
>> }
>>
>> +static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t
>> pci_hole64_size)
>> +{
>> + X86CPU *cpu = X86_CPU(first_cpu);
>> +
>> + /* 32-bit systems don't have hole64 thus return max CPU address */
>> + if (cpu->phys_bits <= 32) {
>> + return ((hwaddr)1 << cpu->phys_bits) - 1;
>> + }
>> +
>> + return pc_pci_hole64_start() + pci_hole64_size - 1;
>> +}
>> +
> [...]
>
>> +
>> + /*
>> + * Relocating ram-above-4G requires more than TCG_PHYS_ADDR_BITS (40).
>> + * So make sure phys-bits is required to be appropriately sized in order
>> + * to proceed with the above-4g-region relocation and thus boot.
>
> drop mention of relocation here as it's orthogonal to the check.
> Important thing we are checking here is that max used GPA is
> reachable by configured vCPU (physbits).
>
OK
>> + */
>> + maxusedaddr = pc_max_used_gpa(pcms, pci_hole64_size);
>> + maxphysaddr = ((hwaddr)1 << cpu->phys_bits) - 1;
>> + if (maxphysaddr < maxusedaddr) {
>> + error_report("Address space limit 0x%"PRIx64" < 0x%"PRIx64
>> + " phys-bits too low (%u)",
>> + maxphysaddr, maxusedaddr, cpu->phys_bits);
>> + exit(EXIT_FAILURE);
>> + }
>
> these hunks should be a separate patch preceding relocation patch
> as it basically does max_gpa vs physbits check regardless
> of relocation (i.e. relocation is only one of the reasons
> max_used_gpa might exceed physbits).
>
Yeap, makes sense given that this is now generic regardless of AMD 1Tb hole.
>> +
>> /*
>> * Split single memory region and use aliases to address portions of it,
>> * done for backwards compatibility with older qemus.
>
- [PATCH v7 01/10] hw/i386: add 4g boundary start to X86MachineState, (continued)
- [PATCH v7 01/10] hw/i386: add 4g boundary start to X86MachineState, Joao Martins, 2022/07/14
- [PATCH v7 05/10] i386/pc: factor out cxl range end to helper, Joao Martins, 2022/07/14
- [PATCH v7 02/10] i386/pc: create pci-host qdev prior to pc_memory_init(), Joao Martins, 2022/07/14
- [PATCH v7 04/10] i386/pc: factor out above-4g end to an helper, Joao Martins, 2022/07/14
- [PATCH v7 03/10] i386/pc: pass pci_hole64_size to pc_memory_init(), Joao Martins, 2022/07/14
- [PATCH v7 06/10] i386/pc: factor out cxl range start to helper, Joao Martins, 2022/07/14
- [PATCH v7 07/10] i386/pc: handle unitialized mr in pc_get_cxl_range_end(), Joao Martins, 2022/07/14
- [PATCH v7 08/10] i386/pc: factor out device_memory base/size to helper, Joao Martins, 2022/07/14
- [PATCH v7 09/10] i386/pc: relocate 4g start to 1T where applicable, Joao Martins, 2022/07/14
- [PATCH v7 10/10] i386/pc: restrict AMD only enforcing of 1Tb hole to new machine type, Joao Martins, 2022/07/14