[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 7/7] spapr.c: consider CPU core online state before allowi

From: Daniel Henrique Barboza
Subject: Re: [PATCH v1 7/7] spapr.c: consider CPU core online state before allowing unplug
Date: Fri, 15 Jan 2021 18:43:14 -0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0

On 1/15/21 2:22 PM, Greg Kurz wrote:
On Thu, 14 Jan 2021 15:06:28 -0300
Daniel Henrique Barboza <danielhb413@gmail.com> wrote:

The only restriction we have when unplugging CPUs is to forbid unplug of
the boot cpu core. spapr_core_unplug_possible() does not contemplate the

I can't remember why this restriction was introduced in the first place...
This should be investigated and documented if the limitation still stands.

I looked it up and found out that restriction was added by this commit:

commit 62be8b044adf47327ebefdefb25f28a40316ebd0
Author: Bharata B Rao <bharata@linux.vnet.ibm.com>
Date:   Wed Jul 27 10:44:42 2016 +0530

    spapr: Prevent boot CPU core removal
Boot CPU is assumed to be always present in QEMU code. So
    until that assumptions are gone, deny removal request.
    In another words, QEMU won't support boot CPU core hot-unplug.

I don't think it necessarily has to do with pSeries code though. I was unable to
offline the CPU0 of my x86 notebook:

# lscpu | grep -i 'on-line'
On-line CPU(s) list:             0-7
# echo 0 > /sys/devices/system/cpu/cpu0/online
bash: /sys/devices/system/cpu/cpu0/online: Permission denied
# echo 0 > /sys/devices/system/cpu/cpu1/online
# lscpu | grep -i 'on-line'
On-line CPU(s) list:             0,2-7
# echo 0 > /sys/devices/system/cpu/cpu0/online
bash: /sys/devices/system/cpu/cpu0/online: Permission denied

The pseries kernel does not have this restriction (offlining CPU0).

Maybe we're limiting CPU0 unplug in pseries because it would break common QEMU
code that has this restriction due to x86/ACPI mechanics because, apparently,
x86 can't hotunplug CPU0.

If a good x86 soul reads this and confirm/deny my assumption I appreciate :)



possibility of some cores being offlined by the guest, meaning that we're
rolling the dice regarding on whether we're unplugging the last online
CPU core the guest has.

Trying to unplug the last CPU is obviously something that deserves
special care. LoPAPR is quite explicit on the outcome : this should
terminate the partition. Isolation of CPUs

The isolation of a CPU, in all cases, is preceded by the stop-self
RTAS function for all processor threads, and the OS insures that all
the CPU’s threads are in the RTAS stopped state prior to isolating the
CPU. Isolation of a processor that is not stopped produces unpredictable
results. The stopping of the last processor thread of a LPAR partition
effectively kills the partition, and at that point, ownership of all
partition resources reverts to the platform firmware.

R1- For the LRDR option: Prior to issuing the RTAS
set-indicator specifying isolate isolation-state of a CPU DR
connector type, all the CPU threads must be in the RTAS stopped

R1- For the LRDR option: Stopping of the last processor
thread of a LPAR partition with the stop-self RTAS function, must kill
the partition, with ownership of all partition resources reverting to
the platform firmware.

This is clearly not how things work today : linux doesn't call
"stop-self" on the last vCPU and even if it did, QEMU doesn't
terminate the VM.

If there's a valid reason to not implement this PAPR behavior, I'd like
it to be documented.

If we hit the jackpot, we're going to detach the core DRC and pulse the
hotplug IRQ, but the guest OS will refuse to release the CPU. Our
spapr_core_unplug() DRC release callback will never be called and the CPU
core object will keep existing in QEMU. No error message will be sent
to the user, but the CPU core wasn't unplugged from the guest.

If the guest OS onlines the CPU core again we won't be able to hotunplug it
either. 'dmesg' inside the guest will report a failed attempt to offline an
unknown CPU:

[  923.003994] pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16

This is the result of stopping the DRC state transition in the middle in the
first failed attempt.

Yes, at this point only a machine reset can fix things up.

Given this is linux's choice not to call "stop-self" as it should do, I'm not
super fan of hardcoding this logic in QEMU, unless there are really good
reasons to do so.

We can avoid this, and potentially other bad things from happening, if we
avoid to attempt the unplug altogether in this scenario. Let's check for
the online/offline state of the CPU cores in the guest before allowing
the hotunplug, and forbid removing a CPU core if it's the last one online
in the guest.

Reported-by: Xujun Ma <xuma@redhat.com>
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
  hw/ppc/spapr.c | 39 ++++++++++++++++++++++++++++++++++++++-
  1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a2f01c21aa..d269dcd102 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3709,9 +3709,16 @@ static void spapr_core_unplug(HotplugHandler 
*hotplug_dev, DeviceState *dev)
  static int spapr_core_unplug_possible(HotplugHandler *hotplug_dev, CPUCore 
                                        Error **errp)
+    CPUArchId *core_slot;
+    SpaprCpuCore *core;
+    PowerPCCPU *cpu;
+    CPUState *cs;
+    bool last_cpu_online = true;
      int index;
- if (!spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id, &index)) {
+    core_slot = spapr_find_cpu_slot(MACHINE(hotplug_dev), cc->core_id,
+                                    &index);
+    if (!core_slot) {
          error_setg(errp, "Unable to find CPU core with core-id: %d",
          return -1;
@@ -3722,6 +3729,36 @@ static int spapr_core_unplug_possible(HotplugHandler 
*hotplug_dev, CPUCore *cc,
          return -1;
+ /* Allow for any non-boot CPU core to be unplugged if already offline */
+    core = SPAPR_CPU_CORE(core_slot->cpu);
+    cs = CPU(core->threads[0]);
+    if (cs->halted) {
+        return 0;
+    }
+    /*
+     * Do not allow core unplug if it's the last core online.
+     */
+    cpu = POWERPC_CPU(cs);
+    CPU_FOREACH(cs) {
+        PowerPCCPU *c = POWERPC_CPU(cs);
+        if (c == cpu) {
+            continue;
+        }
+        if (!cs->halted) {
+            last_cpu_online = false;
+            break;
+        }
+    }
+    if (last_cpu_online) {
+        error_setg(errp, "Unable to unplug CPU core with core-id %d: it is "
+                   "the only CPU core online in the guest", cc->core_id);
+        return -1;
+    }
      return 0;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]