[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/1] nvdimm: add 'target-node' option

From: Igor Mammedov
Subject: Re: [PATCH v2 1/1] nvdimm: add 'target-node' option
Date: Thu, 29 Jul 2021 14:44:44 +0200

On Mon, 19 Jul 2021 10:01:53 +0800
Jingqi Liu <jingqi.liu@intel.com> wrote:

> Linux kernel version 5.1 brings in support for the volatile-use of
> persistent memory as a hotplugged memory region (KMEM DAX).
> When this feature is enabled, persistent memory can be seen as a
> separate memory-only NUMA node(s). This newly-added memory can be
> selected by its unique NUMA node.
> Add 'target-node' option for 'nvdimm' device to indicate this NUMA
> node. It can be extended to a new node after all existing NUMA nodes.
> The 'node' option of 'pc-dimm' device is to add the DIMM to an
> existing NUMA node. The 'node' should be in the available NUMA nodes.
> For KMEM DAX mode, persistent memory can be in a new separate
> memory-only NUMA node. The new node is created dynamically.
> So users use 'target-node' to control whether persistent memory
> is added to an existing NUMA node or a new NUMA node.
> An example of configuration is as follows.
> Using the following QEMU command:
>  -object 
> memory-backend-file,id=nvmem1,share=on,mem-path=/dev/dax0.0,size=3G,align=2M
>  -device nvdimm,id=nvdimm1,memdev=mem1,label-size=128K,targe-node=2
> To list DAX devices:
>  # daxctl list -u
>  {
>    "chardev":"dax0.0",
>    "size":"3.00 GiB (3.22 GB)",
>    "target_node":2,
>    "mode":"devdax"
>  }
> To create a namespace in Device-DAX mode as a standard memory:
>  $ ndctl create-namespace --mode=devdax --map=mem
> To reconfigure DAX device from devdax mode to a system-ram mode:
>  $ daxctl reconfigure-device dax0.0 --mode=system-ram
> There are two existing NUMA nodes in Guest. After these operations,
> persistent memory is configured as a separate Node 2 and
> can be used as a volatile memory. This NUMA node is dynamically
> created according to 'target-node'.

Well, I've looked at spec and series pointed at v1 thread,
and I don't really see a good reason to add duplicate 'target-node'
property to NVDIMM that for all practical purposes serves the same
purpose as already existing 'node' property.
The only thing that it does on top of existing 'node' property is
facilitate implicit creation of numa nodes on top of user configured

But what I really dislike, is adding implicit path to create
numa nodes from random place.

It just creates mess and and doesn't really work well because you
will have to plumb into other code to account for implicit nodes
for it to work properly. (1st thing that comes to mind is HMAT
configuration won't accept this implicit nodes, there might be
other places that will not work as expected).
So I suggest to abandon this approach and use already existing
numa CLI options to do what you need.

What you are trying to achieve can be done without this series
as QEMU allows to create memory only nodes and even empty ones
(for future hotplug) just fine.
The only thing is that one shall specify complete planned
numa topology on command line.

Here is an example that works for me:
   -machine q35,nvdimm=on \
   -m 4G,slots=4,maxmem=12G \
   -smp 4,cores=2 \
   -object memory-backend-ram,size=4G,policy=bind,host-nodes=0,id=ram-node0 \
   -numa node,nodeid=0,memdev=ram-node0
# explicitly assign all CPUs
   -numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=0,socket-id=1
# and create a cpu-less node for you nvdimm
   -numa node,nodeid=1 

with that you can hotplug nvdimm to with 'node=1' property set
or provide that at startup, like this:
memory-backend-file,id=mem1,share=on,mem-path=nvdimmfile,size=3G,align=2M \
   -device nvdimm,id=nvdimm1,memdev=mem1,label-size=128K,node=1

after boot numactl -H will show:

available: 1 nodes (0)
node 0 cpus: 0 1 2 3
node 0 size: 3924 MB
node 0 free: 3657 MB
node distances:
node   0 
  0:  10 

and after initializing nvdimm as a dax device and
reconfiguring that to system memory it will show
as 'new' 'memory only' node

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3
node 0 size: 3924 MB
node 0 free: 3641 MB
node 1 cpus:
node 1 size: 896 MB
node 1 free: 896 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

> Signed-off-by: Jingqi Liu <jingqi.liu@intel.com>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]