qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3] spapr: Add a new level of NUMA for GPUs


From: Daniel Henrique Barboza
Subject: Re: [PATCH v3] spapr: Add a new level of NUMA for GPUs
Date: Thu, 16 Jul 2020 06:42:11 -0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0



On 7/16/20 2:04 AM, David Gibson wrote:
On Mon, May 25, 2020 at 12:49:27PM -0500, Reza Arbab wrote:
On Mon, May 25, 2020 at 03:05:50PM +1000, David Gibson wrote:
On Fri, May 22, 2020 at 02:53:33PM -0500, Reza Arbab wrote:
--- a/hw/ppc/spapr_pci_nvlink2.c
+++ b/hw/ppc/spapr_pci_nvlink2.c
@@ -362,7 +362,7 @@ void spapr_phb_nvgpu_ram_populate_dt(SpaprPhbState *sphb, 
void *fdt)
          uint32_t associativity[] = {
              cpu_to_be32(0x4),
              SPAPR_GPU_NUMA_ID,
-            SPAPR_GPU_NUMA_ID,
+            cpu_to_be32(nvslot->numa_id),
              SPAPR_GPU_NUMA_ID,
              cpu_to_be32(nvslot->numa_id)


This doesn't look quite right.  In the new case we'll get {
GPU_NUMA_ID, nvslot->numa_id, GPU_NUMA_ID, nvslot->numa_id }.

The associativity reference points are 4 (and now 2), so this is what we
want. I think what you've noticed is that reference points are 1-based
ordinals:

        "...the “ibm,associativity-reference-points” property indicates
boundaries between associativity domains presented by the
“ibm,associativity” property containing “near” and “far” resources. The
first such boundary in the list represents the 1 based ordinal in the
associativity lists of the most significant boundary, with subsequent
entries indicating progressively less significant boundaries."

Right.. AIUI, associativity-reference-points indicates which leves are
"important" from a NUMA distance point of view (though the spec is
very confusing).  But, I'm pretty sure, that ignoring
reference-points, the individual ibm,associativity lists are supposed
to describe a correct hierarchy, even if some levels will get ignored
for distance purposes.  So once you've split up into "numa_id" nodes
at the second level, you can't then go back to just 2 nodes (main
vs. gpu) at the third.


I believe Reza should go with what Skiboot already does in this situation:

(hw/npu2.c)

dt_add_property_cells(mem, "ibm,associativity", 4, chip_id, chip_id, chip_id, 
chip_id);

Which would translate here to:

        uint32_t associativity[] = {
            cpu_to_be32(0x4),
            cpu_to_be32(nvslot->numa_id),
            cpu_to_be32(nvslot->numa_id),
            cpu_to_be32(nvslot->numa_id),
            cpu_to_be32(nvslot->numa_id),
        };


In the end it doesn't matter for the logic since the refpoints are always
0x4 0x4 0x2, meaning that we're ignoring the 1st and 3rd elements entirely
anyways, but at least make the intention clearer: GPUs are always at the
maximum distance from everything else.



Thanks,


DHB








reply via email to

[Prev in Thread] Current Thread [Next in Thread]