qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi


From: Aravinda Prasad
Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH v10 6/6] ppc: spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls
Date: Fri, 5 Jul 2019 16:41:25 +0530
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0


On Friday 05 July 2019 12:07 AM, Greg Kurz wrote:
> On Thu, 4 Jul 2019 10:49:05 +0530
> Aravinda Prasad <address@hidden> wrote:
> 
>>
>>
>> On Thursday 04 July 2019 06:42 AM, David Gibson wrote:
>>> On Wed, Jul 03, 2019 at 02:30:31PM +0530, Aravinda Prasad wrote:
>>>>
>>>>
>>>> On Wednesday 03 July 2019 08:50 AM, David Gibson wrote:
>>>>> On Tue, Jul 02, 2019 at 04:10:08PM +0530, Aravinda Prasad wrote:
>>>>>>
>>>>>>
>>>>>> On Tuesday 02 July 2019 09:41 AM, David Gibson wrote:
>>>>>>> On Wed, Jun 12, 2019 at 02:51:38PM +0530, Aravinda Prasad wrote:
>>>>>>>> This patch adds support in QEMU to handle "ibm,nmi-register"
>>>>>>>> and "ibm,nmi-interlock" RTAS calls and sets the default
>>>>>>>> value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine
>>>>>>>> type 4.0.
>>>>>>>>
>>>>>>>> The machine check notification address is saved when the
>>>>>>>> OS issues "ibm,nmi-register" RTAS call.
>>>>>>>>
>>>>>>>> This patch also handles the case when multiple processors
>>>>>>>> experience machine check at or about the same time by
>>>>>>>> handling "ibm,nmi-interlock" call. In such cases, as per
>>>>>>>> PAPR, subsequent processors serialize waiting for the first
>>>>>>>> processor to issue the "ibm,nmi-interlock" call. The second
>>>>>>>> processor that also received a machine check error waits
>>>>>>>> till the first processor is done reading the error log.
>>>>>>>> The first processor issues "ibm,nmi-interlock" call
>>>>>>>> when the error log is consumed.
>>>>>>>>
>>>>>>>> Signed-off-by: Aravinda Prasad <address@hidden>
>>>>>>>> ---
>>>>>>>>  hw/ppc/spapr.c         |    6 ++++-
>>>>>>>>  hw/ppc/spapr_rtas.c    |   63 
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>  include/hw/ppc/spapr.h |    5 +++-
>>>>>>>>  3 files changed, 72 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>>>>>> index 3d6d139..213d493 100644
>>>>>>>> --- a/hw/ppc/spapr.c
>>>>>>>> +++ b/hw/ppc/spapr.c
>>>>>>>> @@ -2946,6 +2946,9 @@ static void spapr_machine_init(MachineState 
>>>>>>>> *machine)
>>>>>>>>          /* Create the error string for live migration blocker */
>>>>>>>>          error_setg(&spapr->fwnmi_migration_blocker,
>>>>>>>>                  "Live migration not supported during machine check 
>>>>>>>> handling");
>>>>>>>> +
>>>>>>>> +        /* Register ibm,nmi-register and ibm,nmi-interlock RTAS calls 
>>>>>>>> */
>>>>>>>> +        spapr_fwnmi_register();
>>>>>>>>      }
>>>>>>>>  
>>>>>>>>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
>>>>>>>> @@ -4408,7 +4411,7 @@ static void spapr_machine_class_init(ObjectClass 
>>>>>>>> *oc, void *data)
>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = 
>>>>>>>> SPAPR_CAP_ON;
>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_CCF_ASSIST] = SPAPR_CAP_OFF;
>>>>>>>> -    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>>>>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_ON;
>>>>>>>
>>>>>>> Turning this on by default really isn't ok if it stops you running TCG
>>>>>>> guests at all.
>>>>>>
>>>>>> If so this can be "off" by default until TCG is supported.
>>>>>>
>>>>>>>
>>>>>>>>      spapr_caps_add_properties(smc, &error_abort);
>>>>>>>>      smc->irq = &spapr_irq_dual;
>>>>>>>>      smc->dr_phb_enabled = true;
>>>>>>>> @@ -4512,6 +4515,7 @@ static void 
>>>>>>>> spapr_machine_3_1_class_options(MachineClass *mc)
>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_SBBC] = SPAPR_CAP_BROKEN;
>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
>>>>>>>>      smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = 
>>>>>>>> SPAPR_CAP_OFF;
>>>>>>>> +    smc->default_caps.caps[SPAPR_CAP_FWNMI_MCE] = SPAPR_CAP_OFF;
>>>>>>>
>>>>>>> We're now well past 4.0, and in fact we're about to go into soft
>>>>>>> freeze for 4.1, so we're going to miss that too.  So 4.1 and earlier
>>>>>>> will need to retain the old default.
>>>>>>
>>>>>> ok.
>>>>>>
>>>>>>>
>>>>>>>>  }
>>>>>>>>  
>>>>>>>>  DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
>>>>>>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>>>>>>> index a015a80..e010cb2 100644
>>>>>>>> --- a/hw/ppc/spapr_rtas.c
>>>>>>>> +++ b/hw/ppc/spapr_rtas.c
>>>>>>>> @@ -49,6 +49,7 @@
>>>>>>>>  #include "hw/ppc/fdt.h"
>>>>>>>>  #include "target/ppc/mmu-hash64.h"
>>>>>>>>  #include "target/ppc/mmu-book3s-v3.h"
>>>>>>>> +#include "migration/blocker.h"
>>>>>>>>  
>>>>>>>>  static void rtas_display_character(PowerPCCPU *cpu, SpaprMachineState 
>>>>>>>> *spapr,
>>>>>>>>                                     uint32_t token, uint32_t nargs,
>>>>>>>> @@ -352,6 +353,60 @@ static void rtas_get_power_level(PowerPCCPU *cpu, 
>>>>>>>> SpaprMachineState *spapr,
>>>>>>>>      rtas_st(rets, 1, 100);
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>>>>>>> +                                  SpaprMachineState *spapr,
>>>>>>>> +                                  uint32_t token, uint32_t nargs,
>>>>>>>> +                                  target_ulong args,
>>>>>>>> +                                  uint32_t nret, target_ulong rets)
>>>>>>>> +{
>>>>>>>> +    int ret;
>>>>>>>> +    hwaddr rtas_addr = spapr_get_rtas_addr();
>>>>>>>> +
>>>>>>>> +    if (!rtas_addr) {
>>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI_MCE) == SPAPR_CAP_OFF) {
>>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    ret = kvmppc_fwnmi_enable(cpu);
>>>>>>>> +    if (ret == 1) {
>>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
>>>>>>>
>>>>>>> I don't understand this case separate from the others.  We've already
>>>>>>> set the cap, so fwnmi support should be checked and available.
>>>>>>
>>>>>> But we have not enabled fwnmi in KVM. kvmppc_fwnmi_enable() returns 1 if
>>>>>> cap_ppc_fwnmi is not available in KVM.
>>>>>
>>>>> But you've checked for the presence of the extension, yes?  So a
>>>>> failure to enable the cap would be unexpected.  In which case how does
>>>>> this case differ from.. 
>>>>
>>>> No, this is the function where I check for the presence of the
>>>> extension. In kvm_arch_init() we just set cap_ppc_fwnmi to 1 if KVM
>>>> support is available, but don't take any action if unavailable.
>>>
>>> Yeah, that's not ok.  You should be checking for the presence of the
>>> extension in the .apply() function.  If you start up with the spapr
>>> cap selected then failing at nmi-register time means something has
>>> gone badly wrong.
>>
>> So, I should check for two things in the .apply() function: first if
>> cap_ppc_fwnmi is supported and second if cap_ppc_fwnmi is enabled in KVM.
>>
>> In that case kvm_vcpu_enable_cap(cs, KVM_CAP_PPC_FWNMI, 0) should be
>> called during spapr_machine_init().
>>
>> So, we will fail to boot (when SPAPR_CAP_FWNMI_MCE=ON) if cap_ppc_fwnmi
>> can't be enabled irrespective of whether a guest issues nmi,register or not.
>>
> 
> Yes. The idea is that we don't want to expose some feature to the guest
> if we already know it cannot work. The same stands for migration, if
> we know the destination host doesn't support the feature, we don't want
> to confuse the guest because the feature disappeared unexpectedly. The
> correct way to address that is to check the feature is supported in QEMU
> and/or KVM before we start the guest or before we accept a migration
> stream and fail early if we can't provide the feature.

ok.

> 
> Speaking of migration, kvmppc_fwnmi_enable() should be called in a
> post load callback... and the problem I see is that this is a vCPU
> ioctl. So either we add a state "I should enable FWNMI on post load"
> to the vCPU that called nmi,register, or maybe first_cpu can do the
> trick in a post load callback of vmstate_spapr_machine_check.

To summarize the changes I am planning:

If SPAPR_CAP_FWNMI_MCE=ON, then enable cap_ppc_fwnmi in KVM. If
cap_ppc_fwnmi can't be enabled then fail early. This can be included in
spapr_machine_init() function.

During migration, if SPAPR_CAP_FWNMI_MCE=ON then enable cap_ppc_fwnmi by
having a .post_load callback which calls kvmppc_fwnmi_enable(). Fail the
migration if we can't enable cap_ppc_fwnmi on the destination host.

> 
> BTW, since FWNMI is a platform wide feature, ie. you don't need to
> enable it on a per-CPU basis), why is KVM_CAP_PPC_FWNMI a vCPU ioctl
> and not a VM ioctl ?

I think it should be VM ioctl, let me check.

Regards,
Aravinda

> 
>>>
>>> This is necessary for migration: if you start on a system with nmi
>>> support and the guest registers for it, you can't then migrate safely
>>> to a system that doesn't have nmi support.  The way to handle that
>>> case is to have qemu fail to even start up on a destination without
>>> the support.
>>>
>>>> So this case is when we are running an old version of KVM with no
>>>> cap_ppc_fwnmi support.
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>> +        return;
>>>>>>>> +    } else if (ret < 0) {
>>>>>>>> +        error_report("Couldn't enable KVM FWNMI capability");
>>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>>>>>>>> +        return;
>>>>>
>>>>> ..this case.
>>>>
>>>> And this is when we have the KVM support but due to some problem with
>>>> either KVM or QEMU we are unable to enable cap_ppc_fwnmi.
>>>>
>>>>>
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    spapr->guest_machine_check_addr = rtas_ld(args, 1);
>>>>>>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>>>>>>> +                                   SpaprMachineState *spapr,
>>>>>>>> +                                   uint32_t token, uint32_t nargs,
>>>>>>>> +                                   target_ulong args,
>>>>>>>> +                                   uint32_t nret, target_ulong rets)
>>>>>>>> +{
>>>>>>>> +    if (spapr->guest_machine_check_addr == -1) {
>>>>>>>> +        /* NMI register not called */
>>>>>>>> +        rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>>>>>>>> +    } else {
>>>>>>>> +        /*
>>>>>>>> +         * vCPU issuing "ibm,nmi-interlock" is done with NMI handling,
>>>>>>>> +         * hence unset mc_status.
>>>>>>>> +         */
>>>>>>>> +        spapr->mc_status = -1;
>>>>>>>> +        qemu_cond_signal(&spapr->mc_delivery_cond);
>>>>>>>> +        migrate_del_blocker(spapr->fwnmi_migration_blocker);
>>>>>>>
>>>>>>> Hrm.  We add the blocker at the mce request point.  First, that's in
>>>>>>> another patch, which isn't great.  Second, does that mean we could add
>>>>>>> multiple times if we get an MCE on multiple CPUs?  Will that work and
>>>>>>> correctly match adds and removes properly?
>>>>>>
>>>>>> If it is fine to move the migration patch as the last patch in the
>>>>>> sequence, then we will have add and del blocker in the same patch.
>>>>>>
>>>>>> And yes we could add multiple times if we get MCE on multiple CPUs and
>>>>>> as all those cpus call interlock there should be matching number of
>>>>>> delete blockers.
>>>>>
>>>>> Ok, and I think adding the same pointer to the list multiple times
>>>>> will work ok.
>>>>
>>>> I think so
>>>>
>>>>>
>>>>> Btw, add_blocker() can fail - have you handled failure conditions?
>>>>
>>>> yes, I am handling it.
>>>>
>>>>>
>>>>
>>>
>>
> 

-- 
Regards,
Aravinda




reply via email to

[Prev in Thread] Current Thread [Next in Thread]