qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 02/10] numa: introduce MachineClass::forbid_asymmetrical_numa


From: Daniel Henrique Barboza
Subject: Re: [PATCH 02/10] numa: introduce MachineClass::forbid_asymmetrical_numa
Date: Tue, 25 Aug 2020 06:56:46 -0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0



On 8/24/20 8:49 PM, David Gibson wrote:
On Mon, Aug 24, 2020 at 08:45:12AM -0300, Daniel Henrique Barboza wrote:



[...]

LOPAPR support a somewhat asymmetrical NUMA setup in its current
form,

Huh, I didn't even realize that.  What's the mechanism?

LOPAPR mentions that a single resource/node can have multiple associativity
arrays. The idea is to contemplate the situations where the node has
more than one connection with the board.

I say "somewhat" because, right after mentioning that, the spec also says that
the OS should consider that the distance between two nodes must always be
the shortest one of all available arrays. I'll copy/paste the except here
(end of section 15.2, "Numa Resource Associativity":

Ah.  I didn't think that's what "asymmetric NUMA" meant... but come to
think of it, I'm not very sure about that.


This was a poor attempt of my part to cut PAPR some slack.

TBH, even if current PAPR allows for some form of NUMA asymmetry, I don't think
it's worth implementing at all. It'll be more complexity on top of what I
already added here, and the best case scenario will be the kernel ignoring it
(worst case - kernel blowing it up because we're adding more associativity
arrays in each CPU and so on).



Thanks,


DHB


-----

The reason that the “ibm,associativity” property may contain multiple 
associativity
lists is that a resource may be multiply connected into the platform. This 
resource
then has a different associativity characteristics relative to its multiple 
connections.
To determine the associativity between any two resources, the OS scans down the 
two
resources associativity lists in all pair wise combinations counting how many 
domains
are the same until the first domain where the two list do not agree. The 
highest such
count is the associativity between the two resources.

----


DHB



but
the Linux kernel doesn't support it. The effort to implement it in the current
spapr machine code, given that Linux wouldn't mind it, is not worth it. This
is why I chose to invalidate it for pseries.

Igor,

It's kind of difficult to answer that question - PAPR doesn't
specifically describe limitations, it's just that the representation
it uses is inherently limited.  Instead of the obvious, simple and
pretty much universal method (used in the generic kernel and qemu) of
having a matrix of distance between all the nodes, it instead
describes the hierarchy of components that give rise to the different
distances.

So, for each NUMA relevant object (cpu, memory block, host bridge,
etc.) there is a vector of IDs.  Each number in the vector gives one
level of the objects location in the heirarchy.

So, for example the first number might be the physical chip/socket.
the second one which group of cores & memory interfaces sharing an Ln
cache, the third one the specific core number.  So to work out how far
objects are from each other you essentially look at how long a prefix
of their vector they share, which tells you how far above in the
hierarchy you have to go to reach it.

There's a bunch of complicating details, but that's the gist of it.

Perhaps a warning would be better in this case?

In either case, it sounds like this won't be a common constraint
and I now agree with your original suggestion of doing this in
machine initialization code.
Agreed, if it goes to spapr specific machine code I will not object much.
(it will burden just spapr maintainers, so it's about convincing
David in the end)

I believe he's ok with it given that he suggested it in his first reply.

I'll move this verification to spapr machine_init in the next version.



Thanks,

DHB










reply via email to

[Prev in Thread] Current Thread [Next in Thread]