qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO r


From: Gavin Shan
Subject: Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions
Date: Thu, 4 Aug 2022 12:47:55 +1000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.0

Hi Eric,

On 8/3/22 10:52 PM, Eric Auger wrote:
On 8/3/22 15:02, Gavin Shan wrote:
On 8/3/22 5:01 PM, Marc Zyngier wrote:
On Wed, 03 Aug 2022 04:01:04 +0100,
Gavin Shan <gshan@redhat.com> wrote:
On 8/2/22 7:41 PM, Eric Auger wrote:
On 8/2/22 08:45, Gavin Shan wrote:
There are 3 highmem IO regions as below. They can be disabled in
two situations: (a) The specific region is disabled by user. (b)
The specific region doesn't fit in the PA space. However, the base
address and highest_gpa are still updated no matter if the region
is enabled or disabled. It's incorrectly incurring waste in the PA
space.
If I am not wrong highmem_redists and highmem_mmio are not user
selectable

Only highmem ecam depends on machine type & ACPI setup. But I would
say
that in server use case it is always set. So is that optimization
really
needed?

There are two other cases you missed.

- highmem_ecam is enabled after virt-2.12, meaning it stays disabled
    before that.

I don't get this. The current behaviour is to disable highmem_ecam if
it doesn't fit in the PA space. I can't see anything that enables it
if it was disabled the first place.


There are several places or conditions where vms->highmem_ecam can be
disabled:

- virt_instance_init() where vms->highmem_ecam is inherited from
   !vmc->no_highmem_ecam. The option is set to true after virt-2.12
   in virt_machine_2_12_options().

- machvirt_init() where vms->highmem_ecam can be disable if we have
   32-bits vCPUs and failure on loading firmware.

- Another place is where we're talking about. It's address assignment
   to fit the PA space.


- The high memory region can be disabled if user is asking large
    (normal) memory space through 'maxmem=' option. When the requested
    memory by 'maxmem=' is large enough, the high memory regions are
    disabled. It means the normal memory has higher priority than those
    high memory regions. This is the case I provided in (b) of the
    commit log.

Why is that a problem? It matches the expected behaviour, as the
highmem IO region is floating and is pushed up by the memory region.


Eric thought that VIRT_HIGH_GIC_REDIST2 and VIRT_HIGH_PCIE_MMIO regions
aren't user selectable. I tended to explain why it's not true. 'maxmem='
can affect the outcome. When 'maxmem=' value is big enough, there will be
no free area in the PA space to hold those two regions.


In the commit log, I was supposed to say something like below for
(a):

- The specific high memory region can be disabled through changing
    the code by user or developer. For example, 'vms->highmem_mmio'
    is changed from true to false in virt_instance_init().

Huh. By this principle, the user can change anything. Why is it
important?


Still like above. I was explaining the possible cases where those
3 switches can be turned on/off by users or developers. Our code
needs to be consistent and comprehensive.

   vms->highmem_redists
   vms->highmem_ecam
   vms->mmio



Improve address assignment for highmem IO regions to avoid the waste
in the PA space by putting the logic into virt_memmap_fits().

I guess that this is what I understand the least. What do you mean by
"wasted PA space"? Either the regions fit in the PA space, and
computing their addresses in relevant, or they fall outside of it and
what we stick in memap[index].base is completely irrelevant.


It's possible that we run into the following combination. we should
have enough PA space to enable VIRT_HIGH_PCIE_MMIO region. However,
the region is disabled in the original implementation because
VIRT_HIGH_{GIC_REDIST2, PCIE_ECAM} regions consumed 1GB, which is
unecessary and waste in the PA space.
each region's base is aligned on its size.

Yes.


     static MemMapEntry extended_memmap[] = {
         [VIRT_HIGH_GIC_REDIST2] =   { 0x0, 64 * MiB },
         [VIRT_HIGH_PCIE_ECAM] =     { 0x0, 256 * MiB },
         [VIRT_HIGH_PCIE_MMIO] =     { 0x0, 512 * GiB },
so anyway MMIO is at least at 512GB. Having a 1TB IPA space does not
imply any amount of RAM. This depends on the address space.
I    };

Yes. Prior to the start of system memory, there is 1GB used by
various regions either.


     IPA_LIMIT           = (1UL << 40)
     '-maxmem'           = 511GB              /* Memory starts from 1GB */
     '-slots'            = 0
     vms->highmem_rdist2 = false
How can this happen? the only reason for highmem_redists to be reset is
if it does not fit into map_ipa. So if mmio fits, highmem_redists does
too. What do I miss?

The example is having "vms->highmem_rdist2 = flase" BEFORE the address
assignment, it's possible that developer changes the code to disable
it intentionally. The point is the original implementation isn't comprehensive
because it has the wrong assumption that vms->highmem_{rdist2, ecam, mmio} all
true before the address assignment. With the wrong assumption, the base address
is always increased, even the previous region is disabled, during the
address assignment in virt_set_memmap().


     vms->highmem_ecam   = false
     vms->highmem_mmio   = true
I am still skeptic about the relevance of such optim. Isn't it sensible
to expect the host to support large IPA if you want to use such amount
of memory.
I think we should rather better document the memory map from a user
point of view.
Besides if you change the memory map, you need to introduce yet another
option to avoid breaking migration, no?


These 3 high memory regions consumes 513GB with alignment considered when
all of them are enabled. The point is those 3 high memory regions, or part
of them can be squeezed or smashed to accommodate '-maxmem' by design. I
think there are two options I can think of. I personally prefer to keep
the capability. With this, users gain broader range for their '-maxmem'.
Please let me know your preference.

- Revert the capability of squeezing or smashing those high memory regions
  to accommodate '-maxmem'. It means vms->high_{redist2, ecam, mmio} can't
  be disable on conflicts to the PA space.

- Keep the capability, with this optimization applied to make the implementation
  comprehensive.

I think it's worthy to add something about this limitation in 
doc/system/arm/virt.rst.

I don't think I need introduce another option to avoid migration breakage.
Could you explain why I need the extra option?



Signed-off-by: Gavin Shan <gshan@redhat.com>
---
    hw/arm/virt.c | 54
+++++++++++++++++++++++++++++----------------------
    1 file changed, 31 insertions(+), 23 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 9633f822f3..bc0cd218f9 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1688,6 +1688,34 @@ static uint64_t
virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
        return arm_cpu_mp_affinity(idx, clustersz);
    }
    +static void virt_memmap_fits(VirtMachineState *vms, int index,
+                             bool *enabled, hwaddr *base, int
pa_bits)
+{
+    hwaddr size = extended_memmap[index].size;
+
+    /* The region will be disabled if its size isn't given */
+    if (!*enabled || !size) {
In which case do you have !size?

Yeah, we don't have !size and the condition should be removed.

+        *enabled = false;
+        vms->memmap[index].base = 0;
+        vms->memmap[index].size = 0;
It looks dangerous to me to reset the region's base and size like
that.
for instance fdt_add_gic_node() will add dummy data in the dt.

I would guess you missed that the high memory regions won't be exported
through device-tree if they have been disabled. We have a check there,
which is "if (nb_redist_regions == 1)"

+        return;
+    }
+
+    /*
+     * Check if the memory region fits in the PA space. The
memory map
+     * and highest_gpa are updated if it fits. Otherwise, it's
disabled.
+     */
+    *enabled = (ROUND_UP(*base, size) + size <= BIT_ULL(pa_bits));
using a 'fits' local variable would make the code more obvious I think

Lets confirm if you're suggesting something like below?

          bool fits;

          fits = (ROUND_UP(*base, size) + size <= BIT_ULL(pa_bits));

          if (fits) {
             /* update *base, memory mapping, highest_gpa */
          } else {
             *enabled = false;
          }

I guess we can simply do

          if (ROUND_UP(*base, size) + size <= BIT_ULL(pa_bits)) {
             /* update *base, memory mapping, highest_gpa */
          } else {
             *enabled = false;
          }

Please let me know which one looks best to you.

Why should the 'enabled' flag be updated by this function, instead of
returning the value and keeping it as an assignment in the caller
function? It is purely stylistic though.


The idea to put the logic, address assignment for those 3 high memory
regions or updating the flags (vms->high_redist2/ecam/mmio), to the
newly introduced function, to make virt_set_memmap() a bit simplified.
Eric tends to agree the changes with minor adjustments. So I'm going
to keep it as of being, which doesn't mean the stylistic thought is
a bad one :)

Lastly, I need to rewrite the comit log completely because it seems
causing confusions to Eric and you.


Thanks,
Gavin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]