qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO r


From: Eric Auger
Subject: Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions
Date: Fri, 5 Aug 2022 10:00:24 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0

Hi Gavin,

On 8/5/22 10:36, Gavin Shan wrote:
> Hi Eric,
>
> On 8/4/22 5:19 PM, Eric Auger wrote:
>> On 8/4/22 04:47, Gavin Shan wrote:
>>> On 8/3/22 10:52 PM, Eric Auger wrote:
>>>> On 8/3/22 15:02, Gavin Shan wrote:
>>>>> On 8/3/22 5:01 PM, Marc Zyngier wrote:
>>>>>> On Wed, 03 Aug 2022 04:01:04 +0100,
>>>>>> Gavin Shan <gshan@redhat.com> wrote:
>>>>>>> On 8/2/22 7:41 PM, Eric Auger wrote:
>>>>>>>> On 8/2/22 08:45, Gavin Shan wrote:
>>>>>>>>> There are 3 highmem IO regions as below. They can be disabled in
>>>>>>>>> two situations: (a) The specific region is disabled by user. (b)
>>>>>>>>> The specific region doesn't fit in the PA space. However, the
>>>>>>>>> base
>>>>>>>>> address and highest_gpa are still updated no matter if the region
>>>>>>>>> is enabled or disabled. It's incorrectly incurring waste in
>>>>>>>>> the PA
>>>>>>>>> space.
>>>>>>>> If I am not wrong highmem_redists and highmem_mmio are not user
>>>>>>>> selectable
>>>>>>>>
>>>>>>>> Only highmem ecam depends on machine type & ACPI setup. But I
>>>>>>>> would
>>>>>>>> say
>>>>>>>> that in server use case it is always set. So is that optimization
>>>>>>>> really
>>>>>>>> needed?
>>>>>>>
>>>>>>> There are two other cases you missed.
>>>>>>>
>>>>>>> - highmem_ecam is enabled after virt-2.12, meaning it stays
>>>>>>> disabled
>>>>>>>      before that.
>>>>>>
>>>>>> I don't get this. The current behaviour is to disable
>>>>>> highmem_ecam if
>>>>>> it doesn't fit in the PA space. I can't see anything that enables it
>>>>>> if it was disabled the first place.
>>>>>>
>>>>>
>>>>> There are several places or conditions where vms->highmem_ecam can be
>>>>> disabled:
>>>>>
>>>>> - virt_instance_init() where vms->highmem_ecam is inherited from
>>>>>     !vmc->no_highmem_ecam. The option is set to true after virt-2.12
>>>>>     in virt_machine_2_12_options().
>>>>>
>>>>> - machvirt_init() where vms->highmem_ecam can be disable if we have
>>>>>     32-bits vCPUs and failure on loading firmware.
>>>>>
>>>>> - Another place is where we're talking about. It's address assignment
>>>>>     to fit the PA space.
>>>>>
>>>>>>>
>>>>>>> - The high memory region can be disabled if user is asking large
>>>>>>>      (normal) memory space through 'maxmem=' option. When the
>>>>>>> requested
>>>>>>>      memory by 'maxmem=' is large enough, the high memory
>>>>>>> regions are
>>>>>>>      disabled. It means the normal memory has higher priority than
>>>>>>> those
>>>>>>>      high memory regions. This is the case I provided in (b) of the
>>>>>>>      commit log.
>>>>>>
>>>>>> Why is that a problem? It matches the expected behaviour, as the
>>>>>> highmem IO region is floating and is pushed up by the memory region.
>>>>>>
>>>>>
>>>>> Eric thought that VIRT_HIGH_GIC_REDIST2 and VIRT_HIGH_PCIE_MMIO
>>>>> regions
>>>>> aren't user selectable. I tended to explain why it's not true.
>>>>> 'maxmem='
>>>>> can affect the outcome. When 'maxmem=' value is big enough, there
>>>>> will be
>>>>> no free area in the PA space to hold those two regions.
>>>>>
>>>>>>>
>>>>>>> In the commit log, I was supposed to say something like below for
>>>>>>> (a):
>>>>>>>
>>>>>>> - The specific high memory region can be disabled through changing
>>>>>>>      the code by user or developer. For example,
>>>>>>> 'vms->highmem_mmio'
>>>>>>>      is changed from true to false in virt_instance_init().
>>>>>>
>>>>>> Huh. By this principle, the user can change anything. Why is it
>>>>>> important?
>>>>>>
>>>>>
>>>>> Still like above. I was explaining the possible cases where those
>>>>> 3 switches can be turned on/off by users or developers. Our code
>>>>> needs to be consistent and comprehensive.
>>>>>
>>>>>     vms->highmem_redists
>>>>>     vms->highmem_ecam
>>>>>     vms->mmio
>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> Improve address assignment for highmem IO regions to avoid the
>>>>>>>>> waste
>>>>>>>>> in the PA space by putting the logic into virt_memmap_fits().
>>>>>>
>>>>>> I guess that this is what I understand the least. What do you
>>>>>> mean by
>>>>>> "wasted PA space"? Either the regions fit in the PA space, and
>>>>>> computing their addresses in relevant, or they fall outside of it
>>>>>> and
>>>>>> what we stick in memap[index].base is completely irrelevant.
>>>>>>
>>>>>
>>>>> It's possible that we run into the following combination. we should
>>>>> have enough PA space to enable VIRT_HIGH_PCIE_MMIO region. However,
>>>>> the region is disabled in the original implementation because
>>>>> VIRT_HIGH_{GIC_REDIST2, PCIE_ECAM} regions consumed 1GB, which is
>>>>> unecessary and waste in the PA space.
>>>> each region's base is aligned on its size.
>>>
>>> Yes.
>>>
>>>>>
>>>>>       static MemMapEntry extended_memmap[] = {
>>>>>           [VIRT_HIGH_GIC_REDIST2] =   { 0x0, 64 * MiB },
>>>>>           [VIRT_HIGH_PCIE_ECAM] =     { 0x0, 256 * MiB },
>>>>>           [VIRT_HIGH_PCIE_MMIO] =     { 0x0, 512 * GiB },
>>>> so anyway MMIO is at least at 512GB. Having a 1TB IPA space does not
>>>> imply any amount of RAM. This depends on the address space.
>>>> I    };
>>>
>>> Yes. Prior to the start of system memory, there is 1GB used by
>>> various regions either.
>> yes
>>>
>>>>>
>>>>>       IPA_LIMIT           = (1UL << 40)
>>>>>       '-maxmem'           = 511GB              /* Memory starts from
>>>>> 1GB */
>>>>>       '-slots'            = 0
>>>>>       vms->highmem_rdist2 = false
>>>> How can this happen? the only reason for highmem_redists to be
>>>> reset is
>>>> if it does not fit into map_ipa. So if mmio fits, highmem_redists does
>>>> too. What do I miss?
>>>
>>> The example is having "vms->highmem_rdist2 = flase" BEFORE the address
>>> assignment, it's possible that developer changes the code to disable
>>> it intentionally. The point is the original implementation isn't
>>> comprehensive
>>> because it has the wrong assumption that vms->highmem_{rdist2, ecam,
>>> mmio} all
>>> true before the address assignment. With the wrong assumption, the
>>> base address
>>> is always increased, even the previous region is disabled, during the
>>> address assignment in virt_set_memmap().
>> Yes we currently always provision space for those functionalities,
>> independently on whether they are used. But that's true for many other
>> regions in the address map (although smaller) ;-)
>
> Yep :)
>
>>>
>>>
>>>>>       vms->highmem_ecam   = false
>>>>       vms->highmem_mmio   = true
>>>> I am still skeptic about the relevance of such optim. Isn't it
>>>> sensible
>>>> to expect the host to support large IPA if you want to use such amount
>>>> of memory.
>>>> I think we should rather better document the memory map from a user
>>>> point of view.
>>>> Besides if you change the memory map, you need to introduce yet
>>>> another
>>>> option to avoid breaking migration, no?
>>>>
>>>
>>> These 3 high memory regions consumes 513GB with alignment considered
>>> when
>>> all of them are enabled. The point is those 3 high memory regions, or
>>> part
>>> of them can be squeezed or smashed to accommodate '-maxmem' by
>>> design. I
>>> think there are two options I can think of. I personally prefer to keep
>>> the capability. With this, users gain broader range for their
>>> '-maxmem'.
>>> Please let me know your preference.
>>>
>>> - Revert the capability of squeezing or smashing those high memory
>>> regions
>>>    to accommodate '-maxmem'. It means vms->high_{redist2, ecam, mmio}
>>> can't
>>>    be disable on conflicts to the PA space.
>>>
>>> - Keep the capability, with this optimization applied to make the
>>> implementation
>>>    comprehensive.
>>>
>>> I think it's worthy to add something about this limitation in
>>> doc/system/arm/virt.rst.
>> indeed that's worth in any case.
>>>
>>> I don't think I need introduce another option to avoid migration
>>> breakage.
>>> Could you explain why I need the extra option?
>> I think you do. Before and after this patch the QEMU memory regions
>> associated to those devices won't be a the same location in the memory
>> so if you migrate from an old version to a newer one, the guest won't be
>> able to access them
>>
>> OK I have given my own opinion about those potential changes. Let's wait
>> for others' ;-)
>>
>
> Thank you for your comments. Yeah, I would hold to post v2 to collect
> more comments :)
>
> I'm still don't understand how it's affecting migration. If I understand
> correct, the changed address based doesn't affect migration, as explained
> like below. It took me while to looking the source code to figure out
> how VIRT_HIGH_{GIC_REDIST2, PCIE_ECAM} is linked to GIC and PCIe host and
> migration.
>
> For VIRTH_HIGH_GIC_REDIST2 region, it's used by TYPE_ARM_GICV3 or
> TYPE_KVM_ARM_GICV3.
> TYPE_KVM_ARM_GICV3. For both cases, we do NOT migrate the region
> directly. Instead,
> the registers are saved to GICv3CPUState. GICv3CPUState is migrate and
> the registers
> are restored from the instance.
>
> For VIRT_HIGH_PCIE_ECAM, the registers are saved to PCIDevice::config.
> The
> buffer is migrated and PCI's config space is restored from it. In
> hw/net/e1000e.c,
> e1000e_vmstate has something like below embedded:
>
>     VMSTATE_PCI_DEVICE(parent_obj, E1000EState),

Actually I was more thinking about PCI MMIO region. Effectively for
regions that are saved/restored from regs it sounds OK (RDIST).
For ECAM I do not know how migration is handled but better to double
check with PCI/migration experts.

Thanks

Eric
>
> ---
>
> I also did one experiment by having different address bases on the
> source and
> destination. Migration succeeds.
>
> address regions from source
> ---------------------------
> virt_memmap_fits: HIGH_GIC_REDIST2: [0000004000000000  0000000004000000]
> virt_memmap_fits: HIGH_GIC_ECAM:    [0000004010000000  0000000010000000]
> virt_memmap_fits: HIGH_GIC_MMIO:    [0000008000000000  0000008000000000]
>
> address regions from destination
> --------------------------------
> virt_memmap_fits: HIGH_GIC_REDIST2: [0000004040000000  0000000004000000]
> virt_memmap_fits: HIGH_GIC_ECAM:    [0000004050000000  0000000010000000]
> virt_memmap_fits: HIGH_GIC_MMIO:    [0000008000000000  0000008000000000]
>
>
>>>
>>>>>
>>>>>>>>>
>>>>>>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>>>>>>> ---
>>>>>>>>>      hw/arm/virt.c | 54
>>>>>>>>> +++++++++++++++++++++++++++++----------------------
>>>>>>>>>      1 file changed, 31 insertions(+), 23 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>>>>>>>> index 9633f822f3..bc0cd218f9 100644
>>>>>>>>> --- a/hw/arm/virt.c
>>>>>>>>> +++ b/hw/arm/virt.c
>>>>>>>>> @@ -1688,6 +1688,34 @@ static uint64_t
>>>>>>>>> virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
>>>>>>>>>          return arm_cpu_mp_affinity(idx, clustersz);
>>>>>>>>>      }
>>>>>>>>>      +static void virt_memmap_fits(VirtMachineState *vms, int
>>>>>>>>> index,
>>>>>>>>> +                             bool *enabled, hwaddr *base, int
>>>>>>>>> pa_bits)
>>>>>>>>> +{
>>>>>>>>> +    hwaddr size = extended_memmap[index].size;
>>>>>>>>> +
>>>>>>>>> +    /* The region will be disabled if its size isn't given */
>>>>>>>>> +    if (!*enabled || !size) {
>>>>>>>> In which case do you have !size?
>>>>>>>
>>>>>>> Yeah, we don't have !size and the condition should be removed.
>>>>>>>
>>>>>>>>> +        *enabled = false;
>>>>>>>>> +        vms->memmap[index].base = 0;
>>>>>>>>> +        vms->memmap[index].size = 0;
>>>>>>>> It looks dangerous to me to reset the region's base and size like
>>>>>>>> that.
>>>>>>>> for instance fdt_add_gic_node() will add dummy data in the dt.
>>>>>>>
>>>>>>> I would guess you missed that the high memory regions won't be
>>>>>>> exported
>>>>>>> through device-tree if they have been disabled. We have a check
>>>>>>> there,
>>>>>>> which is "if (nb_redist_regions == 1)"
>>>>>>>
>>>>>>>>> +        return;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    /*
>>>>>>>>> +     * Check if the memory region fits in the PA space. The
>>>>>>>>> memory map
>>>>>>>>> +     * and highest_gpa are updated if it fits. Otherwise, it's
>>>>>>>>> disabled.
>>>>>>>>> +     */
>>>>>>>>> +    *enabled = (ROUND_UP(*base, size) + size <=
>>>>>>>>> BIT_ULL(pa_bits));
>>>>>>>> using a 'fits' local variable would make the code more obvious I
>>>>>>>> think
>>>>>>>
>>>>>>> Lets confirm if you're suggesting something like below?
>>>>>>>
>>>>>>>            bool fits;
>>>>>>>
>>>>>>>            fits = (ROUND_UP(*base, size) + size <=
>>>>>>> BIT_ULL(pa_bits));
>>>>>>>
>>>>>>>            if (fits) {
>>>>>>>               /* update *base, memory mapping, highest_gpa */
>>>>>>>            } else {
>>>>>>>               *enabled = false;
>>>>>>>            }
>>>>>>>
>>>>>>> I guess we can simply do
>>>>>>>
>>>>>>>            if (ROUND_UP(*base, size) + size <= BIT_ULL(pa_bits)) {
>>>>>>>               /* update *base, memory mapping, highest_gpa */
>>>>>>>            } else {
>>>>>>>               *enabled = false;
>>>>>>>            }
>>>>>>>
>>>>>>> Please let me know which one looks best to you.
>>>>>>
>>>>>> Why should the 'enabled' flag be updated by this function,
>>>>>> instead of
>>>>>> returning the value and keeping it as an assignment in the caller
>>>>>> function? It is purely stylistic though.
>>>>>>
>>>>>
>>>>> The idea to put the logic, address assignment for those 3 high memory
>>>>> regions or updating the flags (vms->high_redist2/ecam/mmio), to the
>>>>> newly introduced function, to make virt_set_memmap() a bit
>>>>> simplified.
>>>>> Eric tends to agree the changes with minor adjustments. So I'm going
>>>>> to keep it as of being, which doesn't mean the stylistic thought is
>>>>> a bad one :)
>>>>>
>>>>> Lastly, I need to rewrite the comit log completely because it seems
>>>>> causing confusions to Eric and you.
>>>>>
>>>
>
> Thanks,
> Gavin
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]