qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 4/4] target-ppc: Handle NMI guest exit


From: Aravinda Prasad
Subject: Re: [Qemu-devel] [PATCH 4/4] target-ppc: Handle NMI guest exit
Date: Mon, 16 Nov 2015 15:37:58 +0530
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6


On Monday 16 November 2015 01:22 PM, Thomas Huth wrote:
> On 12/11/15 19:49, Aravinda Prasad wrote:
>>
>> On Thursday 12 November 2015 03:10 PM, Thomas Huth wrote:
> ...
>>> Also LoPAPR talks about 'subsequent processors report "fatal error
>>> previously reported"', so maybe the other processors should report that
>>> condition in this case?
>>
>> I feel guest kernel is responsible for that or does that mean that qemu
>> should report the same error, which first processor encountered, for
>> subsequent processors? In that case what if the error encountered by
>> first processor was recovered.
> 
> I simply refered to this text in LoPAPR:
> 
>  Multiple processors of the same OS image may experi-
>  ence fatal events at, or about, the same time. The first processor
>  to enter the machine check handling firmware reports
>  the fatal error. Subsequent processors serialize waiting for the
>  first processor to issue the ibm,nmi-interlock call. These
>  subsequent processors report "fatal error previously reported".

Yes, I asked this because I am not clear what "fatal error previously
reported" means as described in PAPR.

> 
> Is there code in the host kernel already that takes care of this (I
> haven't checked)? If so, how does the host kernel know that the event
> happened "at or about the same time" since you're checking at the QEMU
> side for the mutex condition?

I don't think the host kernel takes care of this; it simply forwards
such errors to QEMU via NMI exit. I feel the time referred by "at or
about the same time" is the duration between the registered machine
check handler is invoked and the corresponding interlock call is issued
by guest, which QEMU knows and is protected by a mutex.

> 
>>> And of course you've also got to check that the same CPU is not getting
>>> multiple NMIs before the interlock function has been called again.
>>
>> I think it is good to check that. However, shouldn't the guest enable ME
>> until it calls interlock function?
> 
> First, the hypervisor should never trust the guest to do the right
> things. Second, LoPAPR says "the OS permanently relinquishes to firmware
> the Machine State Register's Machine Check Enable bit", and Paul also
> said something similar in another mail to this thread, so I think you
> really have to check this in QEMU instead.

Hmm. ok. Since ME is always set when running in guest (assuming guest is
not disabling it), we cannot check ME bit to figure out whether the same
CPU is getting UEs before interlock is called. One way is to record the
CPU ID upon such error and check before invoking registered machine
check handler whether that CPU has a pending interlock call. Terminate
the guest if there is a pending interlock call for that CPU rather than
causing the guest to trigger recursive machine check errors.

Regards,
Aravinda


> 
>  Thomas
> 

-- 
Regards,
Aravinda




reply via email to

[Prev in Thread] Current Thread [Next in Thread]