qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RFC PATCH 0/8] pSeries base FORM2 NUMA affinity support


From: Daniel Henrique Barboza
Subject: [RFC PATCH 0/8] pSeries base FORM2 NUMA affinity support
Date: Mon, 14 Jun 2021 22:33:01 -0300

Hi,

This RFC series implements FORM2 NUMA associativity support in
the pSeries machine. This new associativity format is going to
be added in the LOPAR spec in the near future. For now, the preview
of the specification can be found in Aneesh kernel side patches that
implements this support, specially the documentation patch [2].

For QEMU, the most drastic change FORM2 brings is that, at long
last, we're free from the shackles of an overcomplicated and bloated
way of calculating NUMA distances. This new affinity format promotes
separation from performance metrics such as distance, latency,
bandwidth and so on from the ibm,associativity arrays of the
devices. This also allows for asymmetric NUMA configurations.

FORM2 is set by ibm,architecture-vec-5 bit 2 byte 5. This means that
the guest is able to choose between FORM1 and FORM2 during CAS, and
we need to adapt NUMA internals accordingly based on this choice.
Patches 1 to 5 implement the base FORM2 support in the pSeries
machine. 

Patches 6-8 deal with NVDIMM changes. FORM2 allows NVDIMMs to declare
an extra NUMA node called 'device-node' to support their use as persistent
memory. 'device-node' is locality based an can be different from the
NUMA node that the NVDIMM belongs to when used as regular memory.


With this series and Aneesh's guest kernel from [1], this is the
'numactl -H' output of this guest:

-----

sudo ppc64-softmmu/qemu-system-ppc64 \
-machine pseries,accel=kvm,usb=off,dump-guest-core=off \
-m size=14G,slots=256,maxmem=256G -smp 8,maxcpus=8,cores=2,threads=2,sockets=2 \
(...)
-object memory-backend-ram,id=mem0,size=4G -numa 
node,memdev=mem0,cpus=0-1,nodeid=0 \
-object memory-backend-ram,id=mem1,size=4G -numa 
node,memdev=mem1,cpus=2-3,nodeid=1 \
-object memory-backend-ram,id=mem2,size=4G -numa 
node,memdev=mem2,cpus=4-5,nodeid=2 \
-object memory-backend-ram,id=mem3,size=2G -numa 
node,memdev=mem3,cpus=6-7,nodeid=3 \
-numa dist,src=0,dst=1,val=22 -numa dist,src=0,dst=2,val=22 -numa 
dist,src=0,dst=3,val=22 \
-numa dist,src=1,dst=0,val=44 -numa dist,src=1,dst=2,val=44 -numa 
dist,src=1,dst=3,val=44 \
-numa dist,src=2,dst=0,val=66 -numa dist,src=2,dst=1,val=66 -numa 
dist,src=2,dst=3,val=66 \
-numa dist,src=3,dst=0,val=88 -numa dist,src=3,dst=1,val=88 -numa 
dist,src=3,dst=2,val=88 


# numactl -H 
available: 4 nodes (0-3)
node 0 cpus: 0 1
node 0 size: 3987 MB
node 0 free: 3394 MB
node 1 cpus: 2 3
node 1 size: 4090 MB
node 1 free: 4073 MB
node 2 cpus: 4 5
node 2 size: 4090 MB
node 2 free: 4072 MB
node 3 cpus: 6 7
node 3 size: 2027 MB
node 3 free: 2012 MB
node distances:
node   0   1   2   3 
  0:  10  22  22  22 
  1:  44  10  44  44 
  2:  66  66  10  66 
  3:  88  88  88  10 


The exact user NUMA distances were reflected in the kernel, without any
approximation like we have to do for FORM1.


[1] 
https://lore.kernel.org/linuxppc-dev/20210614164003.196094-1-aneesh.kumar@linux.ibm.com/
[2] 
https://lore.kernel.org/linuxppc-dev/20210614164003.196094-8-aneesh.kumar@linux.ibm.com/


Daniel Henrique Barboza (8):
  spapr: move NUMA data init to do_client_architecture_support()
  spapr_numa.c: split FORM1 code into helpers
  spapr_numa.c: wait for CAS before writing rtas DT
  spapr_numa.c: base FORM2 NUMA affinity support
  spapr: simplify spapr_numa_associativity_init params
  nvdimm: add PPC64 'device-node' property
  spapr_numa, spapar_nvdimm: write secondary NUMA domain for nvdimms
  spapr: move memory/cpu less check to spapr_numa_FORM1_affinity_init()

 hw/mem/nvdimm.c             |  28 ++++
 hw/ppc/spapr.c              |  53 +++-----
 hw/ppc/spapr_hcall.c        |   4 +
 hw/ppc/spapr_numa.c         | 250 +++++++++++++++++++++++++++++++++---
 hw/ppc/spapr_nvdimm.c       |   3 +-
 include/hw/mem/nvdimm.h     |  12 ++
 include/hw/ppc/spapr_numa.h |   6 +-
 include/hw/ppc/spapr_ovec.h |   1 +
 8 files changed, 299 insertions(+), 58 deletions(-)

-- 
2.31.1




reply via email to

[Prev in Thread] Current Thread [Next in Thread]