qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 7/7] spapr.c: consider CPU core online state before allowi


From: David Gibson
Subject: Re: [PATCH v1 7/7] spapr.c: consider CPU core online state before allowing unplug
Date: Fri, 15 Jan 2021 12:03:00 +1100

On Thu, Jan 14, 2021 at 03:06:28PM -0300, Daniel Henrique Barboza wrote:
> The only restriction we have when unplugging CPUs is to forbid unplug of
> the boot cpu core. spapr_core_unplug_possible() does not contemplate the
> possibility of some cores being offlined by the guest, meaning that we're
> rolling the dice regarding on whether we're unplugging the last online
> CPU core the guest has.
> 
> If we hit the jackpot, we're going to detach the core DRC and pulse the
> hotplug IRQ, but the guest OS will refuse to release the CPU. Our
> spapr_core_unplug() DRC release callback will never be called and the CPU
> core object will keep existing in QEMU. No error message will be sent
> to the user, but the CPU core wasn't unplugged from the guest.
> 
> If the guest OS onlines the CPU core again we won't be able to hotunplug it
> either. 'dmesg' inside the guest will report a failed attempt to offline an
> unknown CPU:
> 
> [  923.003994] pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16
> 
> This is the result of stopping the DRC state transition in the middle in the
> first failed attempt.
> 
> We can avoid this, and potentially other bad things from happening, if we
> avoid to attempt the unplug altogether in this scenario. Let's check for
> the online/offline state of the CPU cores in the guest before allowing
> the hotunplug, and forbid removing a CPU core if it's the last one online
> in the guest.

Good explanation overall, but I think it would be a bit clearer and
more direct if you remove the "roll the dice" / "hit the jackpot"
metaphor.



> Reported-by: Xujun Ma <xuma@redhat.com>
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>  hw/ppc/spapr.c | 39 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index a2f01c21aa..d269dcd102 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3709,9 +3709,16 @@ static void spapr_core_unplug(HotplugHandler 
> *hotplug_dev, DeviceState *dev)
>  static int spapr_core_unplug_possible(HotplugHandler *hotplug_dev, CPUCore 
> *cc,
>                                        Error **errp)

This will need a small rework w.r.t. my suggestions for the previous
patch, obviously.

>  {
> +    CPUArchId *core_slot;
> +    SpaprCpuCore *core;
> +    PowerPCCPU *cpu;
> +    CPUState *cs;
> +    bool last_cpu_online = true;
>      int index;
>  
> -    if (!spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, &index)) {
> +    core_slot = spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id,
> +                                    &index);
> +    if (!core_slot) {
>          error_setg(errp, "Unable to find CPU core with core-id: %d",
>                     cc->core_id);
>          return -1;
> @@ -3722,6 +3729,36 @@ static int spapr_core_unplug_possible(HotplugHandler 
> *hotplug_dev, CPUCore *cc,
>          return -1;
>      }
>  
> +    /* Allow for any non-boot CPU core to be unplugged if already offline */
> +    core = SPAPR_CPU_CORE(core_slot->cpu);
> +    cs = CPU(core->threads[0]);
> +    if (cs->halted) {
> +        return 0;
> +    }

I think you need to check that *all* the cpu's threads are offline,
not just thread 0 for this to be correct.

> +
> +    /*
> +     * Do not allow core unplug if it's the last core online.
> +     */
> +    cpu = POWERPC_CPU(cs);
> +    CPU_FOREACH(cs) {
> +        PowerPCCPU *c = POWERPC_CPU(cs);
> +
> +        if (c == cpu) {
> +            continue;
> +        }
> +
> +        if (!cs->halted) {
> +            last_cpu_online = false;
> +            break;
> +        }
> +    }

Likewise here I think your logic needs more careful handling of
threads - you need to disallow the hotplug if all of the currently
online threads are on the core slated for removal.

I'm also a little bit worried about whether just checking cs->halted
is sufficient.  That's a qemu/tcg core concept that I think that may
be set in some circumstances when the CPU is *not* offline.  The logic
of the suspend-me RTAS call is specifically to both set halted *and*
to block interrupts so there's nothing that can pull the vcpu out of
halted state.  It's possible that handling this correctly will require
adding some qemu internal state to explicitly track the "online" state
of a vcpu as managed by "suspend-me" and "start-cpu" RTAS calls.

> +
> +    if (last_cpu_online) {
> +        error_setg(errp, "Unable to unplug CPU core with core-id %d: it is "
> +                   "the only CPU core online in the guest", cc->core_id);
> +        return -1;
> +    }
> +
>      return 0;
>  }
>  

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]