qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 6/6] migration: Block migration while handlin


From: Aravinda Prasad
Subject: Re: [Qemu-devel] [PATCH v7 6/6] migration: Block migration while handling machine check
Date: Thu, 28 Mar 2019 12:09:05 +0530
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0


On Thursday 28 March 2019 06:18 AM, David Gibson wrote:
> On Tue, Mar 26, 2019 at 12:22:50PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Tuesday 26 March 2019 09:49 AM, David Gibson wrote:
>>> On Mon, Mar 25, 2019 at 02:31:10PM +0530, Aravinda Prasad wrote:
>>>>
>>>>
>>>> On Monday 25 March 2019 12:03 PM, David Gibson wrote:
>>>>> On Fri, Mar 22, 2019 at 12:04:25PM +0530, Aravinda Prasad wrote:
>>>>>> Block VM migration requests until the machine check
>>>>>> error handling is complete as (i) these errors are
>>>>>> specific to the source hardware and is irrelevant on
>>>>>> the target hardware, (ii) these errors cause data
>>>>>> corruption and should be handled before migration.
>>>>>>
>>>>>> Signed-off-by: Aravinda Prasad <address@hidden>
>>>>>> ---
>>>>>>  hw/ppc/spapr_events.c  |   17 +++++++++++++++++
>>>>>>  hw/ppc/spapr_rtas.c    |    4 ++++
>>>>>>  include/hw/ppc/spapr.h |    3 +++
>>>>>>  3 files changed, 24 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>>>>>> index d7cc0a4..6356a38 100644
>>>>>> --- a/hw/ppc/spapr_events.c
>>>>>> +++ b/hw/ppc/spapr_events.c
>>>>>> @@ -41,6 +41,7 @@
>>>>>>  #include "qemu/bcd.h"
>>>>>>  #include "hw/ppc/spapr_ovec.h"
>>>>>>  #include <libfdt.h>
>>>>>> +#include "migration/blocker.h"
>>>>>>  
>>>>>>  #define RTAS_LOG_VERSION_MASK                   0xff000000
>>>>>>  #define   RTAS_LOG_VERSION_6                    0x06000000
>>>>>> @@ -866,6 +867,22 @@ static void spapr_mce_build_elog(PowerPCCPU *cpu, 
>>>>>> bool recovered)
>>>>>>  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
>>>>>>  {
>>>>>>      SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>>>>> +    int ret;
>>>>>> +    Error *local_err = NULL;
>>>>>> +
>>>>>> +    error_setg(&spapr->migration_blocker,
>>>>>> +            "Live migration not supported during machine check 
>>>>>> handling");
>>>>>> +    ret = migrate_add_blocker(spapr->migration_blocker, &local_err);
>>>>>> +    if (ret < 0) {
>>>>>> +        /*
>>>>>> +         * We don't want to abort and let the migration to continue. In 
>>>>>> a
>>>>>> +         * rare case, the machine check handler will run on the target
>>>>>> +         * hardware. Though this is not preferable, it is better than 
>>>>>> aborting
>>>>>> +         * the migration or killing the VM.
>>>>>
>>>>> Can't you just discard the error in that case?
>>>>
>>>> I am actually discarding the error. Do you mean I don't have to print
>>>> the below warning or am I missing something?
>>>
>>> Um.. I guess we should keep the warning.  The tricky thing here is
>>> that the information on the machine check is no longer relevant after
>>> the migration, however presumably the event which triggered it might
>>> have already corrupted guest memory, which *would* be relevant to the
>>> guest after migration.  Not sure what to do about that.
>>
>> I think the information is still relevant after the migration, because
>> it includes the effective address in error and the context. If the
>> corruption is in the guest kernel context then the machine check handler
>> inside the guest kernel will call panic(). If the corruption is in user
>> space, then the application is sent a SIGBUS.
> 
> Good point.
> 
>> But the problem is when the machine check handler hardware poisons the
>> page frame backing the corrupt memory to avoid reuse of that page frame.
>> After migration, the page frame backing the corrupt memory is clean, but
>> we poison the clean page frame.
> 
> This is the guest side handler?  It sounds like we we need two
> variants of the notification that say whether or not the area is still
> bad.

Yes this is guest side handler.

Can we just let the guest machine check handler poison a clean page as a
migration during a machine check is a rare occurrence. We may at most
lose a clean page which is a trade-off for additional complexity in the
machine check handler.


> 

-- 
Regards,
Aravinda




reply via email to

[Prev in Thread] Current Thread [Next in Thread]