[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] About live migration rollback
From: |
Gonglei (Arei) |
Subject: |
Re: [Qemu-devel] About live migration rollback |
Date: |
Thu, 3 Jan 2019 01:30:18 +0000 |
Hi,
>
> * Gonglei (Arei) (address@hidden) wrote:
> > Hi Dave,
> >
> > We discussed some live migration fallback scenarios in this year's KVM
> > forum,
> > and now I can provide another scenario, perhaps the upstream should
> consider rolling
> > back for this situation.
> >
> > Environments information:
> >
> > host A: cpu E5620(model WestmereEP without flag xsave)
> > host B: cpu E5-2643(model SandyBridgeEP with flag xsave)
> >
> > The reproduce steps is :
> > 1. Start a windows 2008 vm with -cpu host(which means host-passthrough).
>
> Well we don't guarantee migration across -cpu host - does this problem
> go away if both qemu's are started with matching CPU flags
> (corresponding to the Westmere) ?
>
Sorry, we didn't test other cpu model scenarios since we should assure
that the live migration support from lower generation CPUs to higher
generation CPUs. :(
> > 2. Migrate the vm to host B when cr4.OSXSAVE=0.
> > 3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1.
> > 4. Then migrate the vm to host A successfully, but vm was paused, and qemu
> printed log as followed:
> >
> > KVM: entry failed, hardware error 0x80000021
> >
> > If you're running a guest on an Intel machine without unrestricted mode
> > support, the failure can be most likely due to the guest entering an invalid
> > state for Intel VT. For example, the guest maybe running in big real mode
> > which is not supported on less recent Intel processors.
> >
> > EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000
> > ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20
> > EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> > ES =0000 00000000 0000ffff 00009300
> > CS =f000 ffff0000 0000ffff 00009b00
> > SS =0000 00000000 0000ffff 00009300
> > DS =0000 00000000 0000ffff 00009300
> > FS =0000 00000000 0000ffff 00009300
> > GS =0000 00000000 0000ffff 00009300
> > LDT=0000 00000000 0000ffff 00008200
> > TR =0000 00000000 0000ffff 00008b00
> > GDT= 00000000 0000ffff
> > IDT= 00000000 0000ffff
> > CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
> > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> DR3=0000000000000000
> > DR6=00000000ffff0ff0 DR7=0000000000000400
> > EFER=0000000000000000
> > Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00
> >
> > Problem happened when kvm_put_sregs returns err -22(called by
> kvm_arch_put_registers(qemu)).
> >
> > Because kvm_arch_vcpu_ioctl_set_sregs(kvm module) checked that
> > guest_cpuid_has no X86_FEATURE_XSAVE but cr4.OSXSAVE=1.
> > We should cancel migration if kvm_arch_put_registers returns error.
>
> Do you have a backtrace of when the kvm_arch_put_registers is called
> when it fails?
The main backtrace is below:
qemu_loadvm_state
cpu_synchronize_all_post_init --> w/o return value
cpu_synchronize_post_init --> w/o return value
kvm_cpu_synchronize_post_init --> w/o return value
run_on_cpu ---> w/o return value
do_kvm_cpu_synchronize_post_init --> w/o
return value
kvm_arch_put_registers --> w/ return value
Root cause is some functions don't have return values, the migration thread
can't detect those failures. Paolo?
> If it's called during the loading of the device state then we should be
> able to detect it and fail the migration; however if it's only failing
> after the CPU is restarted after the migration then it's a bit too late.
>
Actually the CPUs haven't started in this scenario.
Thanks,
-Gonglei