[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PC

From: Auger Eric
Subject: Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support
Date: Tue, 26 Feb 2019 18:53:24 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0

Hi Igor,

On 2/26/19 5:56 PM, Igor Mammedov wrote:
> On Tue, 26 Feb 2019 14:11:58 +0100
> Auger Eric <address@hidden> wrote:
>> Hi Igor,
>> On 2/26/19 9:40 AM, Auger Eric wrote:
>>> Hi Igor,
>>> On 2/25/19 10:42 AM, Igor Mammedov wrote:
>>>> On Fri, 22 Feb 2019 18:35:26 +0100
>>>> Auger Eric <address@hidden> wrote:
>>>>> Hi Igor,
>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote:
>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100
>>>>>> Eric Auger <address@hidden> wrote:
>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to
>>>>>>> support device memory in general, and especially PCDIMM/NVDIMM.
>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the
>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
>>>>>>> MMIO region. The address map was 1TB large. This corresponded to
>>>>>>> the max IPA capacity KVM was able to manage.
>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic
>>>>>>> IPA range. So the guest physical address can go beyond the 1TB. The
>>>>>>> max GPA size depends on the host kernel configuration and physical CPUs.
>>>>>>> In this series we use this feature and allow the RAM to grow without
>>>>>>> any other limit than the one put by the host kernel.
>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size
>>>>>>> ram_size and then comes the device memory (,maxmem) of size
>>>>>>> maxram_size - ram_size. The device memory is potentially hotpluggable
>>>>>>> depending on the instantiated memory objects.
>>>>>>> IO regions previously located between 256GB and 1TB are moved after
>>>>>>> the RAM. Their offset is dynamically computed, depends on ram_size
>>>>>>> and maxram_size. Size alignment is enforced.
>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory map
>>>>>>> still is used. The change of memory map becomes effective from 4.0
>>>>>>> onwards.
>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to do
>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do
>>>>>>> that job at the moment.
>>>>>>> Device memory being put just after the initial RAM, it is possible
>>>>>>> to get access to this feature while keeping a 1TB address map.
>>>>>>> This series reuses/rebases patches initially submitted by Shameer
>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
>>>>>>> Functionally, the series is split into 3 parts:
>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in
>>>>>>>    the memory map
>>>>>>> 2) Support of PC-DIMM [10 - 13]
>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed
>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be
>>>>>> visible to the guest. It might be that DT is masking problem
>>>>>> but well, that won't work on ACPI only guests.
>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem
>>>>> added with the DIMM slots.
>>>> Question is how does it get there? Does it come from DT or from firmware
>>>> via UEFI interfaces?
>>>>> So it looks fine to me. Isn't E820 a pure x86 matter?
>>>> sorry for misleading, I've meant is UEFI GetMemoryMap().
>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed
>>>> via UEFI GetMemoryMap() as guest kernel might start using it as normal
>>>> memory early at boot and later put that memory into zone normal and hence
>>>> make it non-hot-un-pluggable. The same concerns apply to DT based means
>>>> of discovery.
>>>> (That's guest issue but it's easy to workaround it not putting hotpluggable
>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly)
>>>> That way memory doesn't get (ab)used by firmware or early boot kernel 
>>>> stages
>>>> and doesn't get locked up.
>>>>> What else would you expect in the dsdt?
>>>> Memory device descriptions, look for code that adds PNP0C80 with _CRS
>>>> describing memory ranges
>>> OK thank you for the explanations. I will work on PNP0C80 addition then.
>>> Does it mean that in ACPI mode we must not output DT hotplug memory
>>> nodes or assuming that PNP0C80 is properly described, it will "override"
>>> DT description?
>> After further investigations, I think the pieces you pointed out are
>> added by Shameer's series, ie. through the build_memory_hotplug_aml()
>> call. So I suggest we separate the concerns: this series brings support
>> for DIMM coldplug. hotplug, including all the relevant ACPI structures
>> will be added later on by Shameer.
> Maybe we should not put pc-dimms in DT for this series until it gets clear
> if it doesn't conflict with ACPI in some way.

I guess you mean removing the DT hotpluggable memory nodes only in ACPI
mode? Otherwise you simply remove the DIMM feature, right?

I double checked and if you remove the hotpluggable memory DT nodes in
ACPI mode:
- you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I
guess you're right, if the DT nodes are available, that memory is
considered as not unpluggable by the guest.
- You can see the NVDIMM slots using ndctl list -u. You can mount a DAX

Hotplug/unplug is clearly not supported by this series and any attempt
results in "memory hotplug is not supported". Is it really an issue if
the guest does not consider DIMM slots as not hot-unpluggable memory? I
am not even sure the guest kernel would support to unplug that memory.

In case we want all ACPI tables to be ready for making this memory seen
as hot-unpluggable we need some Shameer's patches on top of this series.

Also don't DIMM slots already make sense in DT mode. Usually we accept
to add one feature in DT and then in ACPI. For instance we can benefit
from nvdimm in dt mode right? So, considering an incremental approach I
would be in favour of keeping the DT nodes.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]