[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 7/7] spapr.c: consider CPU core online state before allowi

From: Daniel Henrique Barboza
Subject: Re: [PATCH v1 7/7] spapr.c: consider CPU core online state before allowing unplug
Date: Mon, 18 Jan 2021 07:28:25 -0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0

On 1/17/21 10:18 PM, David Gibson wrote:
On Fri, Jan 15, 2021 at 03:52:56PM -0300, Daniel Henrique Barboza wrote:

On 1/15/21 2:22 PM, Greg Kurz wrote:
On Thu, 14 Jan 2021 15:06:28 -0300
Daniel Henrique Barboza <danielhb413@gmail.com> wrote:

The only restriction we have when unplugging CPUs is to forbid unplug of
the boot cpu core. spapr_core_unplug_possible() does not contemplate the

I'll look into it.

I can't remember why this restriction was introduced in the first place...
This should be investigated and documented if the limitation still stands.

possibility of some cores being offlined by the guest, meaning that we're
rolling the dice regarding on whether we're unplugging the last online
CPU core the guest has.

Trying to unplug the last CPU is obviously something that deserves
special care. LoPAPR is quite explicit on the outcome : this should
terminate the partition. Isolation of CPUs

The isolation of a CPU, in all cases, is preceded by the stop-self
RTAS function for all processor threads, and the OS insures that all
the CPU’s threads are in the RTAS stopped state prior to isolating the
CPU. Isolation of a processor that is not stopped produces unpredictable
results. The stopping of the last processor thread of a LPAR partition
effectively kills the partition, and at that point, ownership of all
partition resources reverts to the platform firmware.

I was just investigating a reason why we should check for all thread
states before unplugging the core, like David suggested in his reply.
rtas_stop_self() was setting 'cs->halted = 1' without a thread activity
check like ibm,suspend-me() does and I was wondering why. This text you sent
explains it, quoting:

"> The isolation of a CPU, in all cases, is preceded by the stop-self
  RTAS function for all processor threads, and the OS insures that all
  the CPU’s threads are in the RTAS stopped state prior to isolating the

This seems to be corroborated by arch/powerpc/platform/pseries/hotplug-cpu.c:

Um... this seems like you're overcomplicating this.  The crucial point
here is that 'start-cpu' and 'stop-self' operate on individual
threads, where as dynamic reconfiguration hotplug and unplug works on
whole cores.

static void pseries_cpu_offline_self(void)
        unsigned int hwcpu = hard_smp_processor_id();

        if (xive_enabled())


        /* Should never get here... */

IIUC this means that we can rely on cs->halted = 1 since it's coming right
after a rtas_stop_self() call. This is still a bit confusing though and I
wouldn't mind standardizing the 'CPU core is offline' condition with what
ibm,suspend-me is already doing.

At the moment we have no tracking of whether a core is online.  We
kinda-sorta track whether a *thread* is online through stop-self /

David, what do you think?

I think we can rely on cs->halted = 1 when the thread is offline.
What I'm much less certain about is whether we can count on the thread
being offline when cs->halted = 1.

I guess we should just stick with your first suggestion then. I'll check for
both cs->halted and msr to assert whether a whole core if offline, like
ibm,suspend-me is doing:

        if (!cs->halted || (e->msr & (1ULL << MSR_EE))) {
            rtas_st(rets, 0, H_MULTI_THREADS_ACTIVE);

Another question not necessarily related to this fix: we do a similar check
in the start of do_client_architecture_support(), returning the same
H_MULTI_THREADS_ACTIVE error, but we're not checking e->msr:

    /* CAS is supposed to be called early when only the boot vCPU is active. */
    CPU_FOREACH(cs) {
        if (cs == CPU(cpu)) {
        if (!cs->halted) {
            warn_report("guest has multiple active vCPUs at CAS, which is not 
            return H_MULTI_THREADS_ACTIVE;

Should we bother changing this logic to check for e->msr as well? If there is no
possible harm done I'd rather have the same check returning 
in both places. If there is a special condition in early boot where this check 
do_client_architecture_support() is enough, I would like to put a comment in 
to make it clear why.



R1- For the LRDR option: Prior to issuing the RTAS
set-indicator specifying isolate isolation-state of a CPU DR
connector type, all the CPU threads must be in the RTAS stopped

R1- For the LRDR option: Stopping of the last processor
thread of a LPAR partition with the stop-self RTAS function, must kill
the partition, with ownership of all partition resources reverting to
the platform firmware.

This is clearly not how things work today : linux doesn't call
"stop-self" on the last vCPU and even if it did, QEMU doesn't
terminate the VM.

If there's a valid reason to not implement this PAPR behavior, I'd like
it to be documented.

I can only speculate. This would create a unorthodox way of shutting down
the guest, when the user can just shutdown the whole thing gracefully.

Unless we're considering borderline cases, like the security risk mentioned
in the kernel docs (Documentation/core-api/cpu_hotplug.rst):

"Such advances require CPUs available to a kernel to be removed either for
provisioning reasons, or for RAS purposes to keep an offending CPU off
system execution path. Hence the need for CPU hotplug support in the
Linux kernel."

In this extreme scenario I can see a reason to kill the partition/guest
by offlining the last online CPU - if it's compromising the host we'd
rather terminate immediately instead of waiting for graceful

I'm a bit confused by this comment.  You seem to be conflating
online/offline operations (start-cpu/stop-self) with hot plug/unplug
operations - they're obviously related, but they're not the same

If we hit the jackpot, we're going to detach the core DRC and pulse the
hotplug IRQ, but the guest OS will refuse to release the CPU. Our
spapr_core_unplug() DRC release callback will never be called and the CPU
core object will keep existing in QEMU. No error message will be sent
to the user, but the CPU core wasn't unplugged from the guest.

If the guest OS onlines the CPU core again we won't be able to hotunplug it
either. 'dmesg' inside the guest will report a failed attempt to offline an
unknown CPU:

[  923.003994] pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16

This is the result of stopping the DRC state transition in the middle in the
first failed attempt.

Yes, at this point only a machine reset can fix things up.

Given this is linux's choice not to call "stop-self" as it should do, I'm not
super fan of hardcoding this logic in QEMU, unless there are really good
reasons to do so.

I understand where are you coming from and I sympathize. Not sure about how 
would feel about that though. They expect a somewhat compatible behavior of
multi-arch features like hotplug/hotunplug, and x86 will neither hotunplug nor 
the last CPU as well.

There is a very high chance that, even if we pull this design off, I'll need to 
go to
Libvirt and disable it because we broke compatibility with how vcpu unplug 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]