qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 12/12] target-i386: implement CPU hot-add


From: Eduardo Habkost
Subject: Re: [Qemu-devel] [PATCH 12/12] target-i386: implement CPU hot-add
Date: Wed, 3 Apr 2013 17:57:56 -0300
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Apr 03, 2013 at 10:09:25PM +0200, Igor Mammedov wrote:
> On Wed, 3 Apr 2013 16:27:11 -0300
> Eduardo Habkost <address@hidden> wrote:
> 
> > On Wed, Apr 03, 2013 at 08:59:07PM +0200, Igor Mammedov wrote:
> > > On Wed, 3 Apr 2013 15:10:05 -0300
> > > Eduardo Habkost <address@hidden> wrote:
> > > 
> > > > On Wed, Apr 03, 2013 at 07:58:00PM +0200, Igor Mammedov wrote:
> > > > <snip>
> > > > > > > +void do_cpu_hot_add(const int64_t id, Error **errp)
> > > > > > > +{
> > > > > > > +    pc_new_cpu(saved_cpu_model, id, errp);
> > > > > > > +}
> > > > > > > +
> > > > > > 
> > > > > > Missing x86_cpu_apic_id_from_index(id)?
> > > > > There was(is?) opposition to using cpu_index to identify x86 CPU.
> > > > 
> > > > Really? Do you have a pointer to the discussion?
> > > Here is what I could find in my mail box:
> > > http://lists.gnu.org/archive/html/qemu-devel/2012-05/msg02770.html
> > > Jan could correct me if I'm wrong.
> > 
> > 
> > 
> > So, quoting Jan:
> > > From my POV, cpu_index could become equal to the physical APIC ID.
> > > As long as we can set it freely (provided it remains unique) and
> > > non-continuously, we don't need separate indexes."
> > 
> > We can't choose APIC IDs freely, because the APIC ID is calculated based
> > on the CPU topology (socket + core + thread IDs).
> > 
> > So, the cpu_index could be the same as the APIC ID if the cpu_index
> > value declared opaque, being just a "CPU identifier" that is chosen by
> > QEMU arbitrarily (and that happens to match the APIC ID). But we will
> > probably have some problems with this:
> > 
> > - The CPU index are currently allocated contiguously, and probably
> >   existing interfaces already assume that (e.g. the "-numa' option,
> >   "info numa", "info cpus" and maybe other monitor commands)
> > - QEMU must be responsible for calculating the APIC ID of each CPU,
> >   because it is based on the CPU topology.
> > - If QEMU is the one who calculates the APIC ID, what kind of identifier
> >   we can use for a CPU object in the command-line (e.g. in the "-numa"
> >   option)?
> using any kind of thread id is problematic since 2 treads from the same core
> could end-up on different nodes.
> Maybe placement interface could be better described as node[n]=sockets_list?

The problem here is compatibility: we need to keep existing
command-lines working. And the existing interface (among other things)
doesn't prevent two threads from being in different NUMA nodes.

If today "-smp 9,cores=3,threads=3 -numa node,cpus=0-4 -numa
nodes,cpus=5-8" has a specific meaning, we need to keep the same
meaning.

>  
> > - We may need to redefine the meaning of the "maxcpus" -smp option, if
> >   all our interfaces are now based in non-contiguous and freely-set CPU
> >   identifiers.
> it's amount of CPUs available to guest, pretty clear from user's POV.

I was just worrying if there could be assumptions that "maxcpus is
always > cpu_index". But you are probably right.

>  
> > 
> > In short, getting rid of the contiguous CPU indexes sounds very
> > difficult. We could introduce other kind of identifiers, but probably we
> > may need to keep the CPU indexes contiguous to keep existing interfaces
> > working.
> Once we have CPU unplug, we will have non-contiguous cpu_index. So it will be
> part of CPU unplug series to fix cpu_index allocation/usage where necessary.

Keeping compatibility after CPU unplug is not a problem as CPU unplug
doesn't exist yet. The problem here is to have a realiable identifier
for CPUs that can be used in the command-line. The only identifier we
have for that today is a contiguous CPU index, and if we make them not
contiguous we are going to make existing command-lines that use CPU
indexes (e.g. using "-numa") break.

> 
> > 
> > > 
> > > > 
> > > > 
> > > > > So, it is expected from management to provide APIC ID instead of 
> > > > > cpu_index.
> > > > > It could be useful to make hotplug to a specific NUMA node/cpu to 
> > > > > work in
> > > > > future.
> > > > > Though interface of possible APIC IDs discovery is not part of this 
> > > > > series.
> > > > 
> > > > That's exactly the opposite of what I expect. The APIC ID is an internal
> > > > implementation detail, and external tools must _not_ be required to deal
> > > > with it and to calculate it.
> > > > 
> > > > Communication with the BIOS, on the other hand, is entirely based on the
> > > > APIC ID, and not CPU indexes. So QEMU needs to translate the CPU indexes
> > > > (used to communicate with the outside world) to APIC IDs when talking to
> > > > the BIOS.
> > > cpu_index won't work nicely with hot-adding CPU to specific numa node 
> > > though.
> > 
> > Well, the "-numa node" options are already based on CPU indexes, so it
> > would match it the existing NUMA configuration interface.
> > 
> > > with APIC ID (mgmt might treat it as opaque) we could expose something 
> > > like
> > > 
> > > /machine/icc-bridge/link<CPU[apic_id_n]
> > > ...
> > > 
> > > for all possible CPUs, with empty links for non existing ones.
> > > 
> > > and later add on something like this:
> > > 
> > > /machine/numa_node[x]/link<CPU[apic_id_n]>
> > > ...
> > > 
> > > Libvirt than could just pickup ready apic id from desired place and add 
> > > CPU
> > > either using cpu-add id=xxx or device_add x86-cpu-...,apic_id=xxx
> > > 
> > > +1 more cpu_index is QEMU implementation detail and we could not add to 
> > > x86 CPU
> > > cpu-index property since hardware doesn't have such feature, so it won't 
> > > be
> > > available with device_add.
> > 
> > I don't mind hiding cpu_index too. I don't mind if we use a cpu_index,
> > QOM links, arbitrary IDs set by the user. I just have a problem with
> > requiring libvirt to set the APIC ID.
> > 
> > If you give libvirt an easy way to convert a CPU "location" (index, numa
> > node, whatever) to an APIC ID that is pre-calculated by QEMU, then it
> > could work. But do we really need to require libvirt to deal with APIC
> > ID directly? If you just set the links properly to reflect the CPU
> > "location", the CPU could calculate its APIC ID based on its "location"
> > using the links.
> What about adding CPU to a specific node then, it would require interface for
> communicating to CPU to which node it should be plugged (part of APIC ID, I
> guess).

Using QOM we could just use links. The question to me is how to identify
the CPU "location" reliably if we're going to in a "cpu-set" interface.
My point is that cpu_index works perfectly for that (as long as the
rules about how each CPU index is allocated to each NUMA-node, socket,
core, and thread). Later we can have something not based on CPU indexes,
if we move to a 100% link-based QOM interface.

Having to ask QEMU for the APIC ID somehow and requiring the APIC ID to
be provided on the cpu-set command could work, yes. But it must not
require libvirt to calculate and choose the APIC IDs itself.

(Note that I didn't review all the code yet. Maybe you are already doing
all that)

-- 
Eduardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]