qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU ho


From: Babu Moger
Subject: RE: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU hotplug
Date: Thu, 25 Jun 2020 11:41:25 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0

Igor,

> -----Original Message-----
> From: Igor Mammedov <imammedo@redhat.com>
> Sent: Thursday, June 25, 2020 10:19 AM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
> pbonzini@redhat.com; rth@twiddle.net
> Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU
> hotplug
> 
> On Wed, 24 Jun 2020 12:35:59 -0500
> Babu Moger <babu.moger@amd.com> wrote:
> 
> > > -----Original Message-----
> > > From: Igor Mammedov <imammedo@redhat.com>
> > > Sent: Wednesday, June 24, 2020 8:48 AM
> > > To: Moger, Babu <Babu.Moger@amd.com>
> > > Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
> > > pbonzini@redhat.com; rth@twiddle.net
> > > Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and
> CPU
> > > hotplug
> > >
> > > On Tue, 16 Jun 2020 12:18:56 -0500
> > > Babu Moger <babu.moger@amd.com> wrote:
> > >
> > > > > -----Original Message-----
> > > > > From: Igor Mammedov <imammedo@redhat.com>
> > > > > Sent: Tuesday, June 16, 2020 5:59 AM
> > > > > To: Moger, Babu <Babu.Moger@amd.com>
> > > > > Cc: pbonzini@redhat.com; rth@twiddle.net; ehabkost@redhat.com;
> > > > > mst@redhat.com; marcel.apfelbaum@gmail.com; qemu-
> devel@nongnu.org
> > > > > Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device
> and
> > > CPU
> > > > > hotplug
> > > > >
> > > > > On Mon, 08 Jun 2020 15:18:50 -0500
> > > > > Babu Moger <babu.moger@amd.com> wrote:
> > > > >
> > > > > > Noticed the following command failure while testing CPU hotplug.
> > > > > >
> > > > > > $ qemu-system-x86_64 -machine q35,accel=kvm -smp 1,maxcpus=2,
> > > > > >   cores=1, threads=1,sockets=2 -cpu EPYC -device EPYC-x86_64-
> > > > > >   cpu,core-id=0,socket-id=1,thread-id=0
> > > > > >
> > > > > >   qemu-system-x86_64: -device EPYC-x86_64-cpu,core-id=0,socket-
> id=1,
> > > > > >   thread-id=0: Invalid CPU [socket: 21855, die: 0, core: 0, thread: 
> > > > > > 0]
> > > > > >   with APIC ID 21855, valid index range 0:1
> > > > > >
> > > > > > This happens because APIC ID is calculated using uninitialized 
> > > > > > memory.
> > > > > > This is happening after the addition of new field node_id in
> > > X86CPUTopoIDs
> > > > > > structure. The node_id field is uninitialized while calling
> > > > > > apicid_from_topo_ids. The problem is discussed in the thread below.
> > > > > >
> > > > >
> > >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker
> > > > > nel.org%2Fqemu-
> > > > >
> > >
> devel%2F20200602171838.GG577771%40habkost.net%2F&amp;data=02%7C01
> > > > >
> > >
> %7Cbabu.moger%40amd.com%7C02200d75fd8b48d1955608d811e44f5b%7C3d
> > > > >
> > >
> d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637279019564311233&amp
> > > > >
> > >
> ;sdata=ry3QO0Z5dxLPoRxkYVkOsVm3nl%2BxfCGv8be%2BMHdoUPY%3D&amp;r
> > > > > eserved=0
> > > > > >
> > > > > > Fix the problem by initializing the node_id properly.
> > > > > >
> > > > > > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > > > > > ---
> > > > > >  hw/i386/pc.c               |    2 ++
> > > > > >  include/hw/i386/topology.h |   11 +++++++++++
> > > > > >  2 files changed, 13 insertions(+)
> > > > > >
> > > > > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > > > > > index 2128f3d6fe..974cc30891 100644
> > > > > > --- a/hw/i386/pc.c
> > > > > > +++ b/hw/i386/pc.c
> > > > > > @@ -1585,6 +1585,8 @@ static void pc_cpu_pre_plug(HotplugHandler
> > > > > *hotplug_dev,
> > > > > >          topo_ids.die_id = cpu->die_id;
> > > > > >          topo_ids.core_id = cpu->core_id;
> > > > > >          topo_ids.smt_id = cpu->thread_id;
> > > > > > +        topo_ids.node_id = cpu_x86_use_epyc_apic_id_encoding(ms-
> > > >cpu_type)
> > > > > ?
> > > > > > +                           x86_node_id_for_epyc(&topo_info, 
> > > > > > &topo_ids) : 0;
> > > > >
> > > > > I'd rather not calculate some default value here,
> > > > > this is the branch where we check user provided topology info and 
> > > > > error
> out
> > > > > asking
> > > > > to provide missing bits.
> > > > Noticed that cpu->node_id is initialized to
> 0xFF(NUMA_NODE_UNASSIGNED).
> > > > We can initialize cpu->node_id to default node like how we do it in
> > > > x86_get_default_cpu_node_id.  We can use it to initialize
> topo_ids.node_id.
> > > > This is consistent with other fields core_id, die_id etc..
> > > >
> > > > >
> > > > > I also wonder if we should force user to specify numa nodes on CLI if
> EPYC
> > > cpu is
> > > > > used.
> > > > > (i.e. I'm assuming that EPYC always requires numa)
> > > >
> > > > That is not true. Without numa all the cpus will be configured under one
> > > > default numa node 0. Like we do it using x86_get_default_cpu_node_id.
> > >
> > > get_default_cpu_node_id() which is making things up, is going to be
> removed
> > > eventually in favor of asking user to provide numa mapping explicitly on 
> > > CLI.
> >
> > That will be good going forward.
> >
> > >
> > > now if it's always only node 0, why do we need to calculate it then,
> > > why not just assing 0 directly?
> > >
> > > what if we have several sockets, would all vCPUs still have node-id = 0?
> >
> > If there are several nodes then socket id becomes node id.
> I wonder if node id == socket id then why bother with node_id at all,
> probably node id is there to allow for design where several sockets are on
> the same node
> 
> 
> > > Another question is if we have CPUs with only 1 numa node set on all of
> them,
> > > does it require all other machinery like ACPI SRAT table defined or it's 
> > > just
> > > internal CPU impl. detail?
> >
> > I am not very sure about it. To me it looks like it is just internal cpu
> > implementation detail.
> I'd think it might confuse guest OS, when it decodes more than 1 numa node
> for APIC ID/CPUID but then there are no such nodes described in ACPI.
> While it might work for caches, it would miss any relation of memory mapping
> to nodes or get it wrong if one doesn't match another.
> 
> 
> > I think we have two options here.
> >
> > 1. Update the patch to initialize the node_id the way it is done
> > get_default_cpu_node_id.
> 
> if it were only one node for every CPU (incl. multisocket), I'd go with 
> enabling
> autonuma assigning all CPUs to default 0 node-id, since there is no ambiguity
> where
> CPUs and RAM are mapped to.
> Is it possible to use node-id=0 for all EPYC CPUs even in multisocket config?
> (it seems spec allows only one node per socket, but doesn't say that node ids
> must
> be different.)
> If not, then making up node-id is not an option.
> 
> > 2. Ask the user to pass the node_id while hot plugging the device. This is
> > a clean solution. Will require some data structure changes.
> 
> Here is my brain dump of current very non obvious flow:
> 
>   1. x86_possible_cpu_arch_ids()
>          ms->possible_cpus->cpus[i].props.* = x86_topo_ids_from_idx()
> 
>   2. possible numa_cpu_set()
>          ms->possible_cpus->cpus[i].props.node_id = user input|0 -in autonuma
> case
> 
>   3. x86_cpus_init()
>          // generate apic_id AND makeup node_id embedded into it
>          ms->possible_cpus->cpus[i].arch_id = 
> x86_cpu_apic_id_from_index(x86ms,
> i);
>                         -> x86_apicid_from_cpu_idx_epyc() ->
> x86_topo_ids_from_idx_epyc()
>                                                                  same as 
> x86_topo_ids_from_idx() + node_id
>                      or
>                         -> x86_apicid_from_cpu_idx() -> 
> x86_topo_ids_from_idx()
> 
>   4. pc_cpu_pre_plug()
>          // basically topo ids module node-id is not set or user provided
>          cpu_slot = pc_find_cpu_slot(MACHINE(pcms), cpu->apic_id, &idx);
> 
>   5.
>          // do it again with diff that in EPYC case it my have different 
> node-id than
> cpu_slot
>          x86ms->topo_ids_from_apicid(cpu->apic_id, &topo_info, &topo_ids);
> 
>          //i.e. user input of node-id is ignored
>          set socket-id/core-id/... (but not node-id) from topo_info
> 
>          numa_cpu_pre_plug(cpu_slot)
>                            ^^^^^^
>               if (node_id == CPU_UNSET_NUMA_NODE_ID) {
>                    if (slot->props.has_node_id)
>                        object_property_set_int(... slot->props.node_id, 
> "node-id",...);
>               // this applies to hotplugged without node-id and to initial 
> CPUs (-smp X)
>               // so we may end up with "node-id" being set to user defined 
> value
>               // or left unset (no numa enabled)
>               // while APIC ID will have some node-id encoded in it.
> 
> 
> that's quite a mess, maybe we should unify both
> amd make x86_apicid_from_cpu_idx_epyc()/x86_apicid_from_cpu_idx() use
> ms->possible_cpus->cpus[i].props instead of x86_topo_ids_from_idx()
> i.e
> 
>        x86_apicid_from_cpu_idx_epyc() {
>            topoids = x86_apicid_from_cpu_idx() {
>                           return ms->possible_cpus->cpus[i].props
>                       }
>            if (ms->possible_cpus->cpus[i].props.has_node_id)
>                topoids.node_id = ms->possible_cpus->cpus[i].props.node_id
>            else
>                error_fatal("EPYC requires use of -numa to define topology if 
> using
> more than 1 socket")
>        }
> 
> that way QEMU makes up only node[0] by enabling autonuma or whatever
> user privided explicitly is encoded into APIC ID and it will be always 
> consistent
> with cpu
> *-id properties in possible_cpus and SRAT table QEMU generates.
> 
> as cleanup we can get rid of back and forth conversion [5] and use cpu_slot to
> set
> the same ids.
> 
> Also maybe we should have a check that node-id is the same within socket in
> case of EPYC
> if it's guarantied that EPYC won't support multiple nodes per socket.
> 
> hope it makes at least some sense.

To make things clear, in case of autonuma we don't have to worry about
node_id. We just have to set it topo_ids.node_id to 0 in pc_cpu_pre_plug,
Everything will work as expected. This will solve our current problem of
uninitialized variable.

Problem here is, when user has configured the numa, then setting the
topo_ids.node_id to 0 might not work because it might create duplicate
apicids and device_add will be rejected.  As per the comments in
numa_cpu_pre_plug, this is already broken. Look at the comments below.
Looks like node_id cannot be passed down.
============================================
if (node_id == CPU_UNSET_NUMA_NODE_ID) {
        /* due to bug in libvirt, it doesn't pass node-id from props on
         * device_add as expected, so we have to fix it up here */
        if (slot->props.has_node_id) {
            object_property_set_int(OBJECT(dev), slot->props.node_id,
                                    "node-id", errp);
        }
    } else if (node_id != slot->props.node_id) {
============================================

I was trying to solve this problem setting the node_id correctly for EPYC
at least.  If you think, this is not important we can ignore it (by
setting topo_ids.node_id to 0) and move forward.  I don't see the need for
changing other topology specific code as we have already made very generic.

> 
> 
> > Let me know if you see any other option.
> >
> > >
> > >
> > > > >
> > > > > >          cpu->apic_id = x86ms->apicid_from_topo_ids(&topo_info,
> > > &topo_ids);
> > > > > >      }
> > > > > >
> > > > > > diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> > > > > > index 07239f95f4..ee4deb84c4 100644
> > > > > > --- a/include/hw/i386/topology.h
> > > > > > +++ b/include/hw/i386/topology.h
> > > > > > @@ -140,6 +140,17 @@ static inline unsigned
> > > > > apicid_pkg_offset_epyc(X86CPUTopoInfo *topo_info)
> > > > > >             apicid_node_width_epyc(topo_info);
> > > > > >  }
> > > > > >
> > > > > > +static inline unsigned x86_node_id_for_epyc(X86CPUTopoInfo
> > > *topo_info,
> > > > > > +                                            const X86CPUTopoIDs 
> > > > > > *topo_ids)
> > > > > > +{
> > > > > > +    unsigned nr_nodes = MAX(topo_info->nodes_per_pkg, 1);
> > > > > > +    unsigned cores_per_node = DIV_ROUND_UP((topo_info-
> >dies_per_pkg
> > > *
> > > > > > +                                            
> > > > > > topo_info->cores_per_die *
> > > > > > +                                            
> > > > > > topo_info->threads_per_core),
> > > > > > +                                            nr_nodes);
> > > > > > +
> > > > > > +    return (topo_ids->core_id / cores_per_node) % nr_nodes;
> > > > > what if nr_nodes == 0?
> > > > >
> > > > > > +}
> > > > > >  /*
> > > > > >   * Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
> > > > > >   *
> > > > > >
> > > > > >
> > > >
> > > >
> >




reply via email to

[Prev in Thread] Current Thread [Next in Thread]