Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes

From:	Eduardo Habkost
Subject:	Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes
Date:	Mon, 16 Jun 2014 17:11:40 -0300
User-agent:	Mutt/1.5.23 (2014-03-12)

On Mon, Jun 16, 2014 at 11:49:46AM -0700, Nishanth Aravamudan wrote:
> On 16.06.2014 [13:15:00 -0300], Eduardo Habkost wrote:
> > On Mon, Jun 16, 2014 at 05:53:53PM +1000, Alexey Kardashevskiy wrote:
> > > Currently NUMA nodes must go consequently and QEMU ignores nodes
> > > with @nodeid bigger than the number of NUMA nodes in the command line.
> > 
> > Why would somebody need a NUMA node with nodeid bigger than the number
> > of NUMA nodes? NUMA node IDs must be in the [0, numa_nodes-1] range.
> 
> That is not how the code works currently.
> 
> vl.c::numa_add()
> ...
>         if (get_param_value(option, 128, "nodeid", optarg) == 0) {
>             nodenr = nb_numa_nodes;
>         } else {
>             if (parse_uint_full(option, &nodenr, 10) < 0) {
>                 fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option);
>                 exit(1);
>             }
>         }
> ...
>         if (get_param_value(option, 128, "mem", optarg) == 0) {
>             node_mem[nodenr] = 0;
>         } else {
>             int64_t sval;
>             sval = strtosz(option, &endptr);
>             if (sval < 0 || *endptr) {
>                 fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
>                 exit(1);
>             }
>             node_mem[nodenr] = sval;
>         }
> ...
>       nb_numa_nodes++;
> ...
> 
> So if a user passes nodeid= to the NUMA node definition, that entry in
> node_mem is set to the appropriate value, but nb_numa_nodes, which is
> used to bound the iteration of that array is not bumped appropriately.
> So we end up looking at arbitrary indices in the node_mem array, which
> are often 0.

Yes. What I claim is: the bug is that the code is not validating
nodeids, which must be all in the [0, num_nodes-1] range.

Alexey, why do you believe you need non-contiguous nodeids to declare
memory-less nodes at all? Just use "-numa node" with no arguments, and
you will get a node with node_mem==0.

In other words, if you are trying to do:
  -numa node,id=1,mem=X -numa node,id=3,mem=Y
(so you get nodes 0 and 2 with no memory).

You just need to do this instead:
  -numa node,id=0 -numa node,id=1,mem=X -numa node,id=2 -numa node,id=3,mem=Y

> 
> Note also that means that we can't generically differentiate here
> between a user-defined memoryless node and one that happens to be 0
> because the particular nodeid was not specified on the command-line.

That's because all nodeids must be specified in the command-line.
Accepting omitted nodes is a bug which should be fixed.

> Alexey, how do you differentiate between these two cases after your
> patches? In patch 3, I see you check (and skip in the loop) explicitly
> if !node_mem[nodeid], but I'm not sure how that check can differentiate
> between the statically 0 (from main's intialization loop) and when a
> user says a node's memory is 0. Probably something obvious I'm missing
> (it is Monday after all)...
> 
> > > This prevents us from creating memory-less nodes which is possible
> > > situation in POWERPC under pHyp or Sapphire.
> > 
> > Why? If I recall correctly, nodes without any CPUs or any memory are
> > already allowed.
> 
> They are allowed, but it seems like the code throughout qemu (where it's
> relevant) assumes that NUMA nodes are sequential and continuous, but
> that's not required (nor is it enforced on the command-line).

It is not being enforced, but you will get weird bugs if you don't do
it. What I suggest is that we start enforcing it instead of magically
crating new nodes when a wrong (too high) ID is provided.

> 
> > How exactly would this patch help you? How do you expect the
> > command-line to look like for your use case?
> 
> Alexey has replied with that, it looks like.

Where? I don't see a reply.

> 
> > > This makes nb_numa_nodes a total number of nodes or the biggest node
> > > number + 1 whichever is greater.
> > > 
> > > Signed-off-by: Alexey Kardashevskiy <address@hidden>
> > > ---
> > >  vl.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/vl.c b/vl.c
> > > index ac0e3d7..f1b75cb 100644
> > > --- a/vl.c
> > > +++ b/vl.c
> > > @@ -1356,7 +1356,7 @@ static void numa_add(const char *optarg)
> > >          if (get_param_value(option, 128, "cpus", optarg) != 0) {
> > >              numa_node_parse_cpus(nodenr, option);
> > >          }
> > > -        nb_numa_nodes++;
> > > +        nb_numa_nodes = MAX(nb_numa_nodes + 1, nodenr + 1);
> > 
> > I would instead suggest that if any node in the [0, max_node_id] range
> > is not present on the command-line, QEMU should instead reject the
> > command-line.
> 
> We already check two things:
> 
> Too many nodes (meaning we've filled the array):
>         if (nb_numa_nodes >= MAX_NODES) {
>             fprintf(stderr, "qemu: too many NUMA nodes\n");
>             exit(1);
>         }
> 
> Node ID itself is out of range (due to the use of an array):
>         if (nodenr >= MAX_NODES) {
>             fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr);
>             exit(1);
>         }
> 
> But that doesn't prevent one from having *sparse* NUMA node IDs. And, as
> far as I can tell, this is allowed by the spec, but isn't properly
> supported by qemu.

Wait, is the node ID visible to the guest at all? I believe it is a
QEMU-internal thing, just to allow the NUMA nodes to be ordered in the
command-line. I would even claim that the parameter is useless and
shouldn't have been introduced in the first place.

What I don't se is: why you need the command-line to look like:
  -numa node,id=1,mem=X
when you can simply write it as:
  -numa node,id=0 -numa node,id=1,mem=X

-- 
Eduardo

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH 0/7] spapr: rework memory nodes, Alexey Kardashevskiy, 2014/06/16
- [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes, Alexey Kardashevskiy, 2014/06/16
  - Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes, Eduardo Habkost, 2014/06/16
    - Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes, Nishanth Aravamudan, 2014/06/16
    - Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes, Eduardo Habkost <=
    - Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes, Eduardo Habkost, 2014/06/16
    - Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes, Nishanth Aravamudan, 2014/06/16
    - Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes, Nishanth Aravamudan, 2014/06/16
- [Qemu-devel] [PATCH 4/7] spapr: Split memory nodes to power-of-two blocks, Alexey Kardashevskiy, 2014/06/16
  - Re: [Qemu-devel] [PATCH 4/7] spapr: Split memory nodes to power-of-two blocks, Alexey Kardashevskiy, 2014/06/17
- [Qemu-devel] [PATCH 1/7] spapr: Move DT memory node rendering to a helper, Alexey Kardashevskiy, 2014/06/16
- [Qemu-devel] [PATCH 3/7] spapr: Refactor spapr_populate_memory(), Alexey Kardashevskiy, 2014/06/16
  - Re: [Qemu-devel] [PATCH 3/7] spapr: Refactor spapr_populate_memory(), Alexey Kardashevskiy, 2014/06/18
  - Re: [Qemu-devel] [PATCH 3/7] spapr: Refactor spapr_populate_memory(), Nishanth Aravamudan, 2014/06/20
    - Re: [Qemu-devel] [PATCH 3/7] spapr: Refactor spapr_populate_memory(), Alexey Kardashevskiy, 2014/06/20

Prev by Date: Re: [Qemu-devel] [PATCH v4 0/2] qga: Add guest-fsfreeze-freeze-list command
Next by Date: Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes
Previous by thread: Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes
Next by thread: Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes
Index(es):
- Date
- Thread