[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes
From: |
Eduardo Habkost |
Subject: |
Re: [Qemu-devel] [PATCH 7/7] numa: Allow empty nodes |
Date: |
Mon, 16 Jun 2014 17:11:40 -0300 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
On Mon, Jun 16, 2014 at 11:49:46AM -0700, Nishanth Aravamudan wrote:
> On 16.06.2014 [13:15:00 -0300], Eduardo Habkost wrote:
> > On Mon, Jun 16, 2014 at 05:53:53PM +1000, Alexey Kardashevskiy wrote:
> > > Currently NUMA nodes must go consequently and QEMU ignores nodes
> > > with @nodeid bigger than the number of NUMA nodes in the command line.
> >
> > Why would somebody need a NUMA node with nodeid bigger than the number
> > of NUMA nodes? NUMA node IDs must be in the [0, numa_nodes-1] range.
>
> That is not how the code works currently.
>
> vl.c::numa_add()
> ...
> if (get_param_value(option, 128, "nodeid", optarg) == 0) {
> nodenr = nb_numa_nodes;
> } else {
> if (parse_uint_full(option, &nodenr, 10) < 0) {
> fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option);
> exit(1);
> }
> }
> ...
> if (get_param_value(option, 128, "mem", optarg) == 0) {
> node_mem[nodenr] = 0;
> } else {
> int64_t sval;
> sval = strtosz(option, &endptr);
> if (sval < 0 || *endptr) {
> fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg);
> exit(1);
> }
> node_mem[nodenr] = sval;
> }
> ...
> nb_numa_nodes++;
> ...
>
> So if a user passes nodeid= to the NUMA node definition, that entry in
> node_mem is set to the appropriate value, but nb_numa_nodes, which is
> used to bound the iteration of that array is not bumped appropriately.
> So we end up looking at arbitrary indices in the node_mem array, which
> are often 0.
Yes. What I claim is: the bug is that the code is not validating
nodeids, which must be all in the [0, num_nodes-1] range.
Alexey, why do you believe you need non-contiguous nodeids to declare
memory-less nodes at all? Just use "-numa node" with no arguments, and
you will get a node with node_mem==0.
In other words, if you are trying to do:
-numa node,id=1,mem=X -numa node,id=3,mem=Y
(so you get nodes 0 and 2 with no memory).
You just need to do this instead:
-numa node,id=0 -numa node,id=1,mem=X -numa node,id=2 -numa node,id=3,mem=Y
>
> Note also that means that we can't generically differentiate here
> between a user-defined memoryless node and one that happens to be 0
> because the particular nodeid was not specified on the command-line.
That's because all nodeids must be specified in the command-line.
Accepting omitted nodes is a bug which should be fixed.
> Alexey, how do you differentiate between these two cases after your
> patches? In patch 3, I see you check (and skip in the loop) explicitly
> if !node_mem[nodeid], but I'm not sure how that check can differentiate
> between the statically 0 (from main's intialization loop) and when a
> user says a node's memory is 0. Probably something obvious I'm missing
> (it is Monday after all)...
>
> > > This prevents us from creating memory-less nodes which is possible
> > > situation in POWERPC under pHyp or Sapphire.
> >
> > Why? If I recall correctly, nodes without any CPUs or any memory are
> > already allowed.
>
> They are allowed, but it seems like the code throughout qemu (where it's
> relevant) assumes that NUMA nodes are sequential and continuous, but
> that's not required (nor is it enforced on the command-line).
It is not being enforced, but you will get weird bugs if you don't do
it. What I suggest is that we start enforcing it instead of magically
crating new nodes when a wrong (too high) ID is provided.
>
> > How exactly would this patch help you? How do you expect the
> > command-line to look like for your use case?
>
> Alexey has replied with that, it looks like.
Where? I don't see a reply.
>
> > > This makes nb_numa_nodes a total number of nodes or the biggest node
> > > number + 1 whichever is greater.
> > >
> > > Signed-off-by: Alexey Kardashevskiy <address@hidden>
> > > ---
> > > vl.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/vl.c b/vl.c
> > > index ac0e3d7..f1b75cb 100644
> > > --- a/vl.c
> > > +++ b/vl.c
> > > @@ -1356,7 +1356,7 @@ static void numa_add(const char *optarg)
> > > if (get_param_value(option, 128, "cpus", optarg) != 0) {
> > > numa_node_parse_cpus(nodenr, option);
> > > }
> > > - nb_numa_nodes++;
> > > + nb_numa_nodes = MAX(nb_numa_nodes + 1, nodenr + 1);
> >
> > I would instead suggest that if any node in the [0, max_node_id] range
> > is not present on the command-line, QEMU should instead reject the
> > command-line.
>
> We already check two things:
>
> Too many nodes (meaning we've filled the array):
> if (nb_numa_nodes >= MAX_NODES) {
> fprintf(stderr, "qemu: too many NUMA nodes\n");
> exit(1);
> }
>
> Node ID itself is out of range (due to the use of an array):
> if (nodenr >= MAX_NODES) {
> fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr);
> exit(1);
> }
>
> But that doesn't prevent one from having *sparse* NUMA node IDs. And, as
> far as I can tell, this is allowed by the spec, but isn't properly
> supported by qemu.
Wait, is the node ID visible to the guest at all? I believe it is a
QEMU-internal thing, just to allow the NUMA nodes to be ordered in the
command-line. I would even claim that the parameter is useless and
shouldn't have been introduced in the first place.
What I don't se is: why you need the command-line to look like:
-numa node,id=1,mem=X
when you can simply write it as:
-numa node,id=0 -numa node,id=1,mem=X
--
Eduardo
[Qemu-devel] [PATCH 4/7] spapr: Split memory nodes to power-of-two blocks, Alexey Kardashevskiy, 2014/06/16
[Qemu-devel] [PATCH 1/7] spapr: Move DT memory node rendering to a helper, Alexey Kardashevskiy, 2014/06/16
[Qemu-devel] [PATCH 3/7] spapr: Refactor spapr_populate_memory(), Alexey Kardashevskiy, 2014/06/16