qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/4] hw/i386: add facility to expose CPU topology over fw-cfg


From: Igor Mammedov
Subject: Re: [PATCH 3/4] hw/i386: add facility to expose CPU topology over fw-cfg
Date: Wed, 9 Oct 2019 13:13:05 +0200

On Tue, 8 Oct 2019 20:58:30 +0200
Laszlo Ersek <address@hidden> wrote:

> Eduardo, Igor,
> 
> On 10/08/19 12:52, Laszlo Ersek wrote:
> > FW_CFG_MAX_CPUS exposes the (exclusive) maximum APIC ID to guest firmware,
> > due to historical reasons. That value is not useful to edk2, however. For
> > supporting VCPU hotplug, edk2 needs:
> > 
> > - the boot CPU count (already exposed in FW_CFG_NB_CPUS),
> > 
> > - and the maximum foreseen CPU count (tracked in
> >   "MachineState.smp.max_cpus", but not currently exposed).
> > 
> > Add a new fw-cfg file to expose "max_cpus".
> > 
> > While at it, expose the rest of the topology too (die / core / thread
> > counts), because I expect that the VCPU hotplug feature for OVMF will
> > ultimately need those too, and the data size is not large.  
> 
> In fact, it seems like OVMF will have to synthesize the new
> (hot-plugged) VCPU's *APIC-ID* from the following information sources:
> 
> - the topology information described above (die / core / thread counts), and
> 
> - the "modern" CPU hotplug interface (docs/specs/acpi_cpu_hotplug.txt).

In general duplicating cpu_index+topo => apic id in firmware I
consider as a really bad idea (even ignoring cpu_index 
which I were trying to get rid of in QEMU), it's going to break
when algorithms diverge and it will be never ending race.

Topology is rather messy business, not only arch specific but also
cpu specific (ex: on my review TODO list, there is a series for
fixing broken EPYCs topo). Who knows what other variables would be
add dependencies for calculating APIC IDs down the road.

I also consider to re-use CPU hotplug interface on ARM, which will
bring its own set of algorithms.

Let's instead add a command to CPU hotplug interface to return
APIC ID (which QEMU already calculated) and later MPIDR (ARM)
for selected CPU, so firmware could get it while enumeration CPUs
via that interface.

> 
> Now, if I understand correctly, the "CPU selector" ([0x0-0x3]) specifies
> a CPU *index*. Therefore, in the hotplug SMI handler (running on one of
> the pre-existent CPUs), OVMF will have to translate the new CPU's
> selector (the new CPU's *index*) to its *APIC-ID*, based on the topology
> information (numbers of dies / cores / threads).
> 
> (That's because existent SMM infrastructure in edk2 uses the initial
> APIC-ID as the key for referencing CPUs.)
> 
> Algorithmically, I think this translation is doable in OVMF  -- after
> all, it is implemented in the x86_apicid_from_cpu_idx() function
> already, in "include/hw/i386/topology.h". And that function does not
> need more information either:
> 
> static inline apic_id_t x86_apicid_from_cpu_idx(unsigned nr_dies,
>                                                 unsigned nr_cores,
>                                                 unsigned nr_threads,
>                                                 unsigned cpu_index)
> 
> Therefore, my plan is to implement the same translation logic in OVMF.
> 
> Now, the questions:
> 
> - Am I right to think that the "CPU selector" register in the "modern"
> ACPI hotplug interface operates in the *same domain* as the "cpu_index"
> parameter of x86_apicid_from_cpu_idx()?
>
> - As we progress through CPU indices, x86_apicid_from_cpu_idx() first
> fills threads in a core, then cores in a die, then dies in a socket.
> Will this logic remain the same, forever?
> 
> If any of the two questions above is answered with "no", then OVMF will
> need an fw_cfg blob that is different from the current proposal.
> 
> Namely, OVMF will need a *full* "cpu_index -> APIC-ID" map, up to (and
> excluding) "max_cpus".
> 
> The pc_possible_cpu_arch_ids() function in "hw/i386/pc.c" already
> calculates a similar map:
> 
>         ms->possible_cpus->cpus[i].arch_id =
> x86_cpu_apic_id_from_index(pcms, i);
> 
> So, basically that map is what OVMF would have to receive over fw_cfg,
> *if* the "cpu_index -> APIC-ID" mapping is not considered guest ABI.
> Should I write v2 for that?
> 
> Please comment!
> 
> Thanks,
> Laszlo
> 
> 
> > This is
> > slightly complicated by the fact that the die count is specific to
> > PCMachineState, but fw_cfg_arch_create() intends to be PC-independent (see
> > commit 149c50cabcc4).
> > 
> > For now, the feature is temporarily disabled.
> > 
> > Cc: "Michael S. Tsirkin" <address@hidden>
> > Cc: Eduardo Habkost <address@hidden>
> > Cc: Igor Mammedov <address@hidden>
> > Cc: Marcel Apfelbaum <address@hidden>
> > Cc: Paolo Bonzini <address@hidden>
> > Cc: Philippe Mathieu-Daudé <address@hidden>
> > Cc: Richard Henderson <address@hidden>
> > Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1515
> > Signed-off-by: Laszlo Ersek <address@hidden>
> > ---
> >  hw/i386/fw_cfg.h | 30 +++++++++++++++++++++++++++++-
> >  hw/i386/fw_cfg.c | 26 ++++++++++++++++++++++++--
> >  hw/i386/pc.c     |  4 ++--
> >  3 files changed, 55 insertions(+), 5 deletions(-)
> > 
> > diff --git a/hw/i386/fw_cfg.h b/hw/i386/fw_cfg.h
> > index e0856a376996..d742435b9793 100644
> > --- a/hw/i386/fw_cfg.h
> > +++ b/hw/i386/fw_cfg.h
> > @@ -18,9 +18,37 @@
> >  #define FW_CFG_E820_TABLE       (FW_CFG_ARCH_LOCAL + 3)
> >  #define FW_CFG_HPET             (FW_CFG_ARCH_LOCAL + 4)
> >  
> > +/**
> > + * FWCfgX86Topology: expose the X86 CPU topology to guest firmware over 
> > fw-cfg.
> > + *
> > + * All fields have little-endian encoding.
> > + *
> > + * @dies:     Number of dies per package (aka socket). Set it to 1 unless 
> > the
> > + *            concrete MachineState subclass defines it differently.
> > + * @cores:    Corresponds to @CpuTopology.@cores.
> > + * @threads:  Corresponds to @CpuTopology.@threads.
> > + * @max_cpus: Corresponds to @CpuTopology.@max_cpus.
> > + *
> > + * Firmware can derive the package (aka socket) count with the following
> > + * formula:
> > + *
> > + *   DIV_ROUND_UP(@max_cpus, @dies * @cores * @threads)
> > + *
> > + * Firmware can derive APIC ID field widths and offsets per the standard
> > + * calculations in "include/hw/i386/topology.h".
> > + */
> > +typedef struct FWCfgX86Topology {
> > +  uint32_t dies;
> > +  uint32_t cores;
> > +  uint32_t threads;
> > +  uint32_t max_cpus;
> > +} QEMU_PACKED FWCfgX86Topology;
> > +
> >  FWCfgState *fw_cfg_arch_create(MachineState *ms,
> >                                 uint16_t boot_cpus,
> > -                               uint16_t apic_id_limit);
> > +                               uint16_t apic_id_limit,
> > +                               unsigned smp_dies,
> > +                               bool expose_topology);
> >  void fw_cfg_build_smbios(MachineState *ms, FWCfgState *fw_cfg);
> >  void fw_cfg_build_feature_control(MachineState *ms, FWCfgState *fw_cfg);
> >  
> > diff --git a/hw/i386/fw_cfg.c b/hw/i386/fw_cfg.c
> > index 39b6bc60520c..33d09875014f 100644
> > --- a/hw/i386/fw_cfg.c
> > +++ b/hw/i386/fw_cfg.c
> > @@ -85,9 +85,26 @@ void fw_cfg_build_smbios(MachineState *ms, FWCfgState 
> > *fw_cfg)
> >      }
> >  }
> >  
> > +static void fw_cfg_expose_topology(FWCfgState *fw_cfg,
> > +                                   unsigned dies,
> > +                                   unsigned cores,
> > +                                   unsigned threads,
> > +                                   unsigned max_cpus)
> > +{
> > +    FWCfgX86Topology *topo = g_new(FWCfgX86Topology, 1);
> > +
> > +    topo->dies     = cpu_to_le32(dies);
> > +    topo->cores    = cpu_to_le32(cores);
> > +    topo->threads  = cpu_to_le32(threads);
> > +    topo->max_cpus = cpu_to_le32(max_cpus);
> > +    fw_cfg_add_file(fw_cfg, "etc/x86-smp-topology", topo, sizeof *topo);
> > +}
> > +
> >  FWCfgState *fw_cfg_arch_create(MachineState *ms,
> > -                                      uint16_t boot_cpus,
> > -                                      uint16_t apic_id_limit)
> > +                               uint16_t boot_cpus,
> > +                               uint16_t apic_id_limit,
> > +                               unsigned smp_dies,
> > +                               bool expose_topology)
> >  {
> >      FWCfgState *fw_cfg;
> >      uint64_t *numa_fw_cfg;
> > @@ -143,6 +160,11 @@ FWCfgState *fw_cfg_arch_create(MachineState *ms,
> >                       (1 + apic_id_limit + nb_numa_nodes) *
> >                       sizeof(*numa_fw_cfg));
> >  
> > +    if (expose_topology) {
> > +        fw_cfg_expose_topology(fw_cfg, smp_dies, ms->smp.cores,
> > +                               ms->smp.threads, ms->smp.max_cpus);
> > +    }
> > +
> >      return fw_cfg;
> >  }
> >  
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index bcda50efcc23..bb72b12edad2 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -1738,8 +1738,8 @@ void pc_memory_init(PCMachineState *pcms,
> >                                          option_rom_mr,
> >                                          1);
> >  
> > -    fw_cfg = fw_cfg_arch_create(machine,
> > -                                pcms->boot_cpus, pcms->apic_id_limit);
> > +    fw_cfg = fw_cfg_arch_create(machine, pcms->boot_cpus, 
> > pcms->apic_id_limit,
> > +                                pcms->smp_dies, false);
> >  
> >      rom_set_fw(fw_cfg);
> >  
> >   
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]