qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] POWER8 <-> POWER8E migration changed/broken in ppc-for.2.


From: Alexey Kardashevskiy
Subject: Re: [Qemu-ppc] POWER8 <-> POWER8E migration changed/broken in ppc-for.2.10. Intended?
Date: Fri, 7 Jul 2017 13:13:49 +1000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1

On 07/07/17 12:09, David Gibson wrote:
> On Tue, Jul 04, 2017 at 08:20:12AM -0300, Daniel Henrique Barboza wrote:
>>
>>
>> On 06/22/2017 09:46 PM, Alexey Kardashevskiy wrote:
>>> On 22/06/17 18:19, David Gibson wrote:
>>>> On Thu, Jun 22, 2017 at 09:08:55AM +0200, Thomas Huth wrote:
>>>>> On 22.06.2017 05:22, Alexey Kardashevskiy wrote:
>>>>>> On 22/06/17 12:49, Alexey Kardashevskiy wrote:
>>>>>>> On 22/06/17 11:45, Alexey Kardashevskiy wrote:
>>>>>>>> On 22/06/17 07:19, Daniel Henrique Barboza wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Yesterday I noticed that the migration between POWER8 and POWER8E 
>>>>>>>>> hosts
>>>>>>>>> isn't working
>>>>>>>>> as before. These are the machines that I've been using for DRC 
>>>>>>>>> testing and
>>>>>>>>> I discovered
>>>>>>>>> this behavior when trying to test the DRC cleanup series (v5).
>>>>>>>>>
>>>>>>>>> What happens in that the migration seems to be completed - no errors 
>>>>>>>>> on
>>>>>>>>> both source and
>>>>>>>>> destination, destination status marked as 'running' - but the guest 
>>>>>>>>> at dest
>>>>>>>>> looks
>>>>>>>>> frozen/inactive. Migrating from POWER8 to POWER8E or the other way 
>>>>>>>>> around
>>>>>>>>> presents the same behavior.
>>>>>>>>>
>>>>>>>>> This is the command line I'm executing:
>>>>>>>>>
>>>>>>>>> sudo ./qemu-system-ppc64 -name migrate_qemu -boot strict=on -device
>>>>>>>>> nec-usb-xhci,id=usb,bus=pci.0,addr=0xf -device
>>>>>>>>> spapr-vscsi,id=scsi0,reg=0x2000 -smp
>>>>>>>>> 1,maxcpus=4,sockets=4,cores=1,threads=1 --machine
>>>>>>>>> pseries,accel=kvm,kvm-type=HV,usb=off,dump-guest-core=off -m
>>>>>>>>> 4G,slots=32,maxmem=32G -drive
>>>>>>>>> file=/home/danielhb/vm_imgs/ubuntu1704.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none
>>>>>>>>> -device
>>>>>>>>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
>>>>>>>>> -nographic
>>>>>>>>>
>>>>>>>>> I've tracked the problem down to the commit b0517f54ef7d, "ppc: 
>>>>>>>>> Rework CPU
>>>>>>>>> compatibility testing across migration". If I revert this commit, 
>>>>>>>>> migration
>>>>>>>>> works as before
>>>>>>>>> between these hosts.
>>>>>>>> With b0517f54ef7d, CPU state is not marked dirty and SPRs (all 
>>>>>>>> registers in
>>>>>>>> fact) are not put to KVM at the end of migration, hence the freeze, 
>>>>>>>> this
>>>>>>>> has something to do with cpu_synchronize_state() called from 
>>>>>>>> ppc_set_compat().
>>>>>>> No, I was not right, please ignore.
>>>>>>
>>>>>> This hack fixes it:
>>>>>>
>>>>>>
>>>>>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>>>>>> index f2f7c531bc..7ec4146b2b 100644
>>>>>> --- a/target/ppc/kvm.c
>>>>>> +++ b/target/ppc/kvm.c
>>>>>> @@ -919,6 +919,7 @@ static int kvm_put_vpa(CPUState *cs)
>>>>>>       return 0;
>>>>>>   }
>>>>>>   #endif /* TARGET_PPC64 */
>>>>>> +static uint32_t mfpvr(void);
>>>>>>
>>>>>>   int kvmppc_put_books_sregs(PowerPCCPU *cpu)
>>>>>>   {
>>>>>> @@ -927,7 +928,7 @@ int kvmppc_put_books_sregs(PowerPCCPU *cpu)
>>>>>>       int i;
>>>>>>
>>>>>>       sregs.pvr = env->spr[SPR_PVR];
>>>>>> -
>>>>>> +    sregs.pvr = mfpvr();
>>>>>>       sregs.u.s.sdr1 = env->spr[SPR_SDR1];
>>>>>>
>>>>>>       /* Sync SLB */
>>>>> In case you consider to submit this patch: Please add a check for
>>>>> !kvmppc_is_pr() so that this is only done for KVM-HV.
>>>>>
>>>>> But I think the right way to hack this is somewhere in machine.c
>>>>> instead, eg. in cpu_post_load().
>>>> Is the problem is just that we're not considering POWER8 to be
>>>> compatible enough with POWER8E, then we should be altering the
>>>> pvr_match function so it does.
>>>
>>> The problem is that the KVM HV expects the exact host's PVR in
>>> KVM_SET_SREGS, other than that POWER8 == POWER8E pvr_match-wise afaict.
>>>
>>> And Paul suggested that this will bite us again when we start really
>>> worrying about migrating POWER8 to POWER9 so we should probably drop that
>>> PVR check in the KVM at all, in meanwhile we should set PVR to the
>>> destination host PVR value sometime after all these pvr_match() checks, the
>>> place above was just my first guess (which is usually incorrect, I know :) )
>>>
>>
>> I have no problems with this solution - we can re-insert the line
>>
>>         env->spr[SPR_PVR] = env->spr_cb[SPR_PVR].default_value;
>>
>>
>> after all the pvr_checks were made.
>>
>> Is this an acceptable workaround or do we need to dig further? Alexey, do
>> you want to send this patch?
> 
> I'm very uneasy with this approach.  For one thing it relies on
> knowing if we're using HV or PR KVM, which we should be avoiding.

In this matter they are quite different - one can emulate PVR, the other
cannot. I'd say we need yet another KVM capability, like KVM_CAP_PPC_PVR or
KVM_CAP_PPC_COMPAT and act different in QEMU.


> 
> But more importantly, it denies the kernel an opportunity to assess if
> the CPU it's hosting on really is a good enough approximation of the
> one that qemu really wants.

The assessment is done by the KVM_REG_PPC_ARCH_COMPAT handler in HV KVM,
looks sufficient.


> 
> My intention here - right since the early days of KVM HV, was that
> qemu would always put the PVR it really wanted into the sregs field.
> KVM PR would then emulate a specific CPU based on that.  HV, not
> having that option, would instead assess if the requested CPU was
> close enough to the host CPU to get away with it.
> 
> But I guess, since qemu's haphazard handling of compat modes meant
> that didn't really happen until now, we never properly tested that.
> 
> So, really I think the correct fix is in the kernel - to make it
> accept a POWER8E PVR when hosted on a POWER8 and vice versa.

What will we do for the power8 -> power9 migration?


>  But,
> given the kernels that are out there, I guess we'll need a workaround
> in qemu as well.  Ugh.
> 


-- 
Alexey

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]