qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] hw/arm/virt: Expose empty NUMA nodes through ACPI


From: David Hildenbrand
Subject: Re: [PATCH v2] hw/arm/virt: Expose empty NUMA nodes through ACPI
Date: Wed, 10 Nov 2021 12:01:11 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0

On 10.11.21 11:33, Igor Mammedov wrote:
> On Fri, 5 Nov 2021 23:47:37 +1100
> Gavin Shan <gshan@redhat.com> wrote:
> 
>> Hi Drew and Igor,
>>
>> On 11/2/21 6:39 PM, Andrew Jones wrote:
>>> On Tue, Nov 02, 2021 at 10:44:08AM +1100, Gavin Shan wrote:  
>>>>
>>>> Yeah, I agree. I don't have strong sense to expose these empty nodes
>>>> for now. Please ignore the patch.
>>>>  
>>>
>>> So were describing empty numa nodes on the command line ever a reasonable
>>> thing to do? What happens on x86 machine types when describing empty numa
>>> nodes? I'm starting to think that the solution all along was just to
>>> error out when a numa node has memory size = 0...
> 
> memory less nodes are fine as long as there is another type of device
> that describes  a node (apic/gic/...).
> But there is no way in spec to describe completely empty nodes,
> and I dislike adding out of spec entries just to fake an empty node.
> 

There are reasonable *upcoming* use cases for initially completely empty
NUMA nodes with virtio-mem: being able to expose a dynamic amount of
performance-differentiated memory to a VM. I don't know of any existing
use cases that would require that as of now.

Examples include exposing HBM or PMEM to the VM. Just like on real HW,
this memory is exposed via cpu-less, special nodes. In contrast to real
HW, the memory is hotplugged later (I don't think HW supports hotplug
like that yet, but it might just be a matter of time).

The same should be true when using DIMMs instead of virtio-mem in this
example.

> 
>> Sorry for the delay as I spent a few days looking into linux virtio-mem
>> driver. I'm afraid we still need this patch for ARM64. I don't think x86
> 
> does it behave the same way is using pc-dimm hotplug instead of virtio-mem?
> 
> CCing David
> as it might be virtio-mem issue.

Can someone share the details why it's a problem on arm64 but not on
x86-64? I assume this really only applies when having a dedicated, empty
node -- correct?

> 
> PS:
> maybe for virtio-mem-pci, we need to add GENERIC_AFFINITY entry into SRAT
> and describe it as PCI device (we don't do that yet if I'm no mistaken).

virtio-mem exposes the PXM itself, and avoids exposing it memory via any
kind of platform specific firmware maps. The PXM gets translated in the
guest accordingly. For now there was no need to expose this in SRAT --
the SRAT is really only used to expose the maximum possible PFN to the
VM, just like it would have to be used to expose "this is a possible node".

Of course, we could use any other paravirtualized interface to expose
both information. For example, on s390x, I'll have to introduce a new
hypercall to query the "device memory region" to detect the maximum
possible PFN, because existing interfaces don't allow for that. For now
we're ruinning SRAT to expose "maximum possible PFN" simply because it's
easy to re-use.

But I assume that hotplugging a DIMM to an empty node will have similar
issues on arm64.

> 
>> has this issue even though I didn't experiment on X86. For example, I
>> have the following command lines. The hot added memory is put into node#0
>> instead of node#2, which is wrong.

I assume Linux will always fallback to node 0 if node X is not possible
when translating the PXM.

-- 
Thanks,

David / dhildenb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]