qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH v0] numa: API to lookup NUMA node by address


From: Eduardo Habkost
Subject: Re: [Qemu-devel] [RFC PATCH v0] numa: API to lookup NUMA node by address
Date: Tue, 9 Jun 2015 09:40:54 -0300
User-agent: Mutt/1.5.23 (2014-03-12)

On Tue, Jun 09, 2015 at 11:23:19AM +0200, Igor Mammedov wrote:
> On Mon, 8 Jun 2015 12:51:39 -0300
> Eduardo Habkost <address@hidden> wrote:
> 
> > On Mon, Jun 08, 2015 at 11:51:03AM +0200, Igor Mammedov wrote:
> > > On Mon, 8 Jun 2015 11:28:18 +0530
> > > Bharata B Rao <address@hidden> wrote:
> > > 
> > > > On Mon, May 25, 2015 at 02:42:40PM -0300, Eduardo Habkost wrote:
> > > > > On Mon, May 25, 2015 at 01:17:57PM +0530, Bharata B Rao wrote:
> > > > > > On Thu, May 14, 2015 at 11:39:06AM +0200, Paolo Bonzini wrote:
> > > > > > > On 13/05/2015 20:06, Eduardo Habkost wrote:
> > > > > > > > Also, this introduces a circular dependency between pc-dimm.c 
> > > > > > > > and
> > > > > > > > numa.c. Instead of that, pc-dimm could simply notify us when a 
> > > > > > > > new
> > > > > > > > device is realized (with just (addr, end, node) as arguments), 
> > > > > > > > so we can
> > > > > > > > save the list of memory ranges inside struct node_info.
> > > > > > > > 
> > > > > > > > I wonder if the memory API already provides something that 
> > > > > > > > would help
> > > > > > > > us. Paolo, do you see a way we could simply use a MemoryRegion 
> > > > > > > > as input
> > > > > > > > to lookup the NUMA node?
> > > > > > > 
> > > > > > > No, but I guess you could add a 
> > > > > > > numa_get/set_memory_region_node_id API
> > > > > > > that uses a hash table.  That's a variant of the "pc-dimm could 
> > > > > > > simply
> > > > > > > notify" numa.c that you propose above.
> > > > > > 
> > > > > > While you say we can't use MemoryRegion as input to lookup the NUMA 
> > > > > > node,
> > > > > > you suggest that we add numa_get/set_memory_region_node_id. Does 
> > > > > > this API
> > > > > > get/set NUMA node id for the given MemoryRegion ? 
> > > > > 
> > > > > I was going to suggest that, but it would require changing the
> > > > > non-memdev code path to create a MemoryRegion for each node, too. So
> > > > > having a numa_set_mem_node_id(start_addr, end_addr, node_id) API would
> > > > > be simpler.
> > > > 
> > > > In order to save the list of memory ranges inside node_info, I tried 
> > > > this
> > > > approach where I call
> > > > 
> > > > numa_set_mem_node_id(dimm.addr, dimm.size, dimm.node) from
> > > > 
> > > > pc_dimm_realize(), but
> > > > 
> > > > the value of dimm.addr is finalized only later in ->plug().
> > > > 
> > > > So we would have to call this API from arch code like pc_dimm_plug().
> > > > Is that acceptable ?
> > 
> > It looks acceptable to me, as pc.c already has all the rest of the
> > NUMA-specific code for PC. I believe it would be interesting to keep all
> > numa.o dependencies contained inside machine code.
> > 
> > > Could you query pc_dimms' numa property each time you need mapping
> > > instead of additionally storing that mapping elsewhere?
> > 
> > The original patch did that, but I suggested the
> > numa_set_mem_node_id() API for two reasons: 1) not requiring special
> > cases for hotplug inside numa_get_node(); 2) not introducing a circular
> > dependency between pc-dimm.c and numa.c.
> What circular dependency doing foreach(pc-dimm) would introduce?
> So far pc-dimm is independent from numa.c and a regular device with no
> dependencies (except of on backend memdev) and has it's own 'numa' property
> for providing that information to interested users. I'd rather keep
> it separate form legacy numa.c:-numa handling.

pc-dimm.c already depends on numa.c because it checks nb_numa_nodes
inside pc_dimm_realize().

I don't understand what you mean by "legacy numa.c:-numa handling".
Unless there's a way to query pc-dimm (or other code) for all
(address -> numa_node) mappings without a special case for memory
hotplug[1], I wouldn't call node_info[].node_mem "legacy".

[1] And this is exactly what I want to provide with
    numa_set_mem_node_id(): an API that doesn't require special cases
    for memory hotplug.

> 
> The only thing I see for using -numa with -device pc-dimm is to define
> possible fixed memory layout (pc-dimm <-> node mapping) at startup for
> platforms that have to describe it at boot time. And even for that, I'd
> suggest to use something like -numa nodeid=X,addr=Z,size=Y,dimm=ID and have
> board code that does mapping to call 
> numa.c:numa_get_layout_for_device(dimm_id)
> and set received values on pc-dimm properties if plugged in dimm fits them.

I don't understand why you are considering the above. There's no
proposal to change command-line arguments by now.

> 
> What for one needs to pull dimm properties into numa.c?

I am not sure I parsed the question correctly, but:

Original commit message explains why numa_get_node(addr) is needed:

> > This is needed by sPAPR PowerPC to support
> > ibm,dynamic-reconfiguration-memory device tree node which is needed
> > for memory hotplug.

And to make this work, it needs to be aware of NUMA information for
hotplugged memory too.

My proposal is to do that inside the code that actually assigns
addresses to pc-dimm: pc_dimm_realize() (pc.c). So pc-dimm.c and numa.c
won't reference each other, and numa.c won't need any special code for
hotplug.

We could have common helpers later to avoid duplicating the same logic
into other machines, but the point is that we don't need to make numa.c
carry special memory hotplug code, and we don't need to make pc-dimm.c
care about -numa.

>  
> > Having a numa_set_memory_region_node_id(MemoryRegion *mr, int node) API
> > would probably be better, and make the discussion about pc_dimm.addr
> Wouldn't it be layering violation if we put (frontend) nodeid information
> into (backend) MemoryRegion?

We wouldn't, it would be a numa.c function that would just keep track of
the list of MemoryRegions for each NUMA node.

> 
> > moot. But it would require changing
> > memory_region_allocate_system_memory() to avoid
> > allocate_system_memory_nonnuma() even in the !have_memdevs case.
> > 
> 

-- 
Eduardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]