qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] spapr_numa.c: FORM2 table handle nodes with no distance info


From: Nicholas Piggin
Subject: Re: [PATCH] spapr_numa.c: FORM2 table handle nodes with no distance info
Date: Mon, 08 Nov 2021 23:51:09 +1000

Excerpts from Aneesh Kumar K.V's message of November 8, 2021 2:22 pm:
> Daniel Henrique Barboza <danielhb413@gmail.com> writes:
> 
>> On 11/5/21 10:51, Nicholas Piggin wrote:
>>> A configuration that specifies multiple nodes without distance info
>>> results in the non-local points in the FORM2 matrix having a distance of
>>> 0. This causes Linux to complain "Invalid distance value range" because
>>> a node distance is smaller than the local distance.
>>> 
>>> Fix this by building a simple local / remote fallback for points where
>>> distance information is missing.
>>
>> Thanks for looking this up. I checked the output of this same scenario with
>> a FORM1 guest and 4 distance-less NUMA nodes. This is what I got:
>>
>> [root@localhost ~]# numactl -H
>> available: 4 nodes (0-3)
>> (...)
>> node distances:
>> node   0   1   2   3
>>    0:  10  160  160  160
>>    1:  160  10  160  160
>>    2:  160  160  10  160
>>    3:  160  160  160  10
>> [root@localhost ~]#
>>
>>
>> With this patch we're getting '20' instead of '160' because you're using
>> NUMA_DISTANCE_DEFAULT, while FORM1 will default this case to the maximum
>> NUMA distance the kernel allows for that affinity (160).
> 
> where is that enforced? Do we know why FORM1 picked 160? 
> 
>>
>> I do not have strong feelings about changing this behavior between FORM1 and
>> FORM2. I tested the same scenario with a x86_64 guest and they also uses '20'
>> in this case as well, so far as QEMU goes using NUMA_DISTANCE_DEFAULT is
>> consistent.
>>
> 
> for FORM2 I would suggest 20 is the right value and it is also
> consistent with other architectures. 
> 
>> Aneesh is already in CC, so I believe he'll let us know if there's something
>> we're missing and we need to preserve the '160' distance in FORM2 for this
>> case as well.
>>
>> For now:
>>
>>
>>> 
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>> ---
>>
>>
>> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>>
>>
>>
>>>   hw/ppc/spapr_numa.c | 22 +++++++++++++++++-----
>>>   1 file changed, 17 insertions(+), 5 deletions(-)
>>> 
>>> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
>>> index 5822938448..56ab2a5fb6 100644
>>> --- a/hw/ppc/spapr_numa.c
>>> +++ b/hw/ppc/spapr_numa.c
>>> @@ -546,12 +546,24 @@ static void 
>>> spapr_numa_FORM2_write_rtas_tables(SpaprMachineState *spapr,
>>>                * NUMA nodes, but QEMU adds the default NUMA node without
>>>                * adding the numa_info to retrieve distance info from.
>>>                */
>>> -            if (src == dst) {
>>> -                distance_table[i++] = NUMA_DISTANCE_MIN;
>>> -                continue;
> 
> We always initialized the local distance to be NUMA_DISTANCE_MIN
> irrespective of what is specified via Qemu command line before? If so
> then the above change will break that? 

That's true. I think command line should take priority and if we have to 
override it for some reason then we should print a warning.

> 
>>> +            distance_table[i] = numa_info[src].distance[dst];
>>> +            if (distance_table[i] == 0) {
> 
> we know distance_table[i] is == 0 here and ..
> 
>>> +                /*
>>> +                 * In case QEMU adds a default NUMA single node when the 
>>> user
>>> +                 * did not add any, or where the user did not supply 
>>> distances,
>>> +                 * the value will be 0 here. Populate the table with a 
>>> fallback
>>> +                 * simple local / remote distance.
>>> +                 */
>>> +                if (src == dst) {
>>> +                    distance_table[i] = NUMA_DISTANCE_MIN;
>>> +                } else {
>>> +                    distance_table[i] = numa_info[src].distance[dst];
>>> +                    if (distance_table[i] < NUMA_DISTANCE_MIN) {
> 
> 
> considering we reached here after checking distance_table[i] == 0 do we
> need to do the above two lines?

Oh that's true. I think the lines could just be removed.

Thanks,
Nick

> 
>>> +                        distance_table[i] = NUMA_DISTANCE_DEFAULT;
>>> +                    }
>>> +                }
>>>               }
>>> -
>>> -            distance_table[i++] = numa_info[src].distance[dst];
>>> +            i++;
>>>           }
>>>       }
> 
> 
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]