[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH v2 2/4] spapr/rtas: disable the decrementer interr
From: |
Benjamin Herrenschmidt |
Subject: |
Re: [Qemu-ppc] [PATCH v2 2/4] spapr/rtas: disable the decrementer interrupt when a CPU is unplugged |
Date: |
Tue, 10 Oct 2017 10:08:47 +0200 |
On Mon, 2017-10-09 at 17:49 +0200, Cédric Le Goater wrote:
> When a CPU is stopped with the 'stop-self' RTAS call, its state
> 'halted' is switched to 1 and, in this case, the MSR is not taken into
> account anymore in the cpu_has_work() routine. Only the pending
> hardware interrupts are checked with their LPCR:PECE* enablement bit.
>
> If the DECR timer fires after 'stop-self' is called and before the CPU
> 'stop' state is reached, the nearly-dead CPU will have some work to do
> and the guest will crash. This case happens very frequently with the
> not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
> occasionally fired but after 'stop' state, so no work is to be done
> and the guest survives.
>
> I suspect there is a race between the QEMU mainloop triggering the
> timers and the TCG CPU thread but I could not quite identify the root
> cause. To be safe, let's disable the decrementer interrupt in the LPCR
> when the CPU is halted and reenable it when the CPU is restarted.
>
> Signed-off-by: Cédric Le Goater <address@hidden>
We should disable external interrupts and doorbells too no ? IE, we
could clear all of PECE in fact.
> ---
>
> Changes in v2:
>
> - used a new routine ppc_cpu_pvr_match() to discriminate CPU versions
> - removed the LPCR:PECE* enablement bit when the CPU is initialized
> if it is a secondary
>
> hw/ppc/spapr_rtas.c | 20 ++++++++++++++++++++
> target/ppc/translate_init.c | 19 +++++++++++++++++--
> 2 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index cdf0b607a0a0..dfdbf1e2c6f8 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -46,6 +46,7 @@
> #include "qemu/cutils.h"
> #include "trace.h"
> #include "hw/ppc/fdt.h"
> +#include "target/ppc/cpu-models.h"
>
> static void rtas_display_character(PowerPCCPU *cpu, sPAPRMachineState *spapr,
> uint32_t token, uint32_t nargs,
> @@ -174,6 +175,15 @@ static void rtas_start_cpu(PowerPCCPU *cpu_,
> sPAPRMachineState *spapr,
> kvm_cpu_synchronize_state(cs);
>
> env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
> +
> + /* Enable DECR interrupt */
> + if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
> + env->spr[SPR_LPCR] |= LPCR_DEE;
> + } else {
> + /* P7 and P8 both have same bit for DECR */
> + env->spr[SPR_LPCR] |= LPCR_P8_PECE3;
> + }
> +
> env->nip = start;
> env->gpr[3] = r3;
> cs->halted = 0;
> @@ -210,6 +220,16 @@ static void rtas_stop_self(PowerPCCPU *cpu,
> sPAPRMachineState *spapr,
> * no need to bother with specific bits, we just clear it.
> */
> env->msr = 0;
> +
> + /* Don't let the decremeter run on a CPU being stopped. This could
> + * deliver an interrupt on a dying CPU and crash the guest.
> + */
> + if (ppc_cpu_pvr_match(cpu, CPU_POWERPC_LOGICAL_3_00)) {
> + env->spr[SPR_LPCR] &= ~LPCR_DEE;
> + } else {
> + /* P7 and P8 both have same bit for DECR */
> + env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
> + }
> }
>
> static inline int sysparm_st(target_ulong addr, target_ulong len,
> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
> index 0d6379fcc5b4..1a62159843e7 100644
> --- a/target/ppc/translate_init.c
> +++ b/target/ppc/translate_init.c
> @@ -8905,6 +8905,7 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu,
> PPCVirtualHypervisor *vhyp)
> CPUPPCState *env = &cpu->env;
> ppc_spr_t *lpcr = &env->spr_cb[SPR_LPCR];
> ppc_spr_t *amor = &env->spr_cb[SPR_AMOR];
> + CPUState *cs = CPU(cpu);
>
> cpu->vhyp = vhyp;
>
> @@ -8946,8 +8947,15 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu,
> PPCVirtualHypervisor *vhyp)
> } else {
> lpcr->default_value &= ~(LPCR_UPRT | LPCR_GTSE);
> }
> - lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE | LPCR_DEE |
> + lpcr->default_value |= LPCR_PDEE | LPCR_HDEE | LPCR_EEE |
> LPCR_OEE;
> +
> + /* Only let the decremeter wake up the boot CPU. The RTAS
> + * command start-cpu will enable it on secondaries.
> + */
> + if (cs == first_cpu) {
> + lpcr->default_value |= LPCR_DEE;
> + }
> break;
> default:
> /* P7 and P8 has slightly different PECE bits, mostly because P8 adds
> @@ -8955,7 +8963,14 @@ void cpu_ppc_set_papr(PowerPCCPU *cpu,
> PPCVirtualHypervisor *vhyp)
> * will work as expected for both implementations
> */
> lpcr->default_value |= LPCR_P8_PECE0 | LPCR_P8_PECE1 | LPCR_P8_PECE2
> |
> - LPCR_P8_PECE3 | LPCR_P8_PECE4;
> + LPCR_P8_PECE4;
> +
> + /* Only let the decremeter wake up the boot CPU. The RTAS
> + * command start-cpu will enable it on secondaries.
> + */
> + if (cs == first_cpu) {
> + lpcr->default_value |= LPCR_P8_PECE3;
> + }
> }
>
> /* We should be followed by a CPU reset but update the active value
[Qemu-ppc] [PATCH v2 3/4] spapr/rtas: fix reboot of a SMP TCG guest, Cédric Le Goater, 2017/10/09
[Qemu-ppc] [PATCH v2 4/4] spapr/rtas: do not reset the MSR in stop-self command, Cédric Le Goater, 2017/10/09