[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Guest reboot issues since QEMU 6.0 and Linux 5.11
|
From: |
Fiona Ebner |
|
Subject: |
Re: Guest reboot issues since QEMU 6.0 and Linux 5.11 |
|
Date: |
Fri, 22 Jul 2022 14:28:27 +0200 |
|
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 |
Am 21.07.22 um 17:51 schrieb Maxim Levitsky:
> On Thu, 2022-07-21 at 14:49 +0200, Fabian Ebner wrote:
>> Hi,
>> since about half a year ago, we're getting user reports about guest
>> reboot issues with KVM/QEMU[0].
>>
>> The most common scenario is a Windows Server VM (2012R2/2016/2019,
>> UEFI/OVMF and SeaBIOS) getting stuck during the screen with the Windows
>> logo and the spinning circles after a reboot was triggered from within
>> the guest. Quitting the kvm process and booting with a fresh instance
>> works. The issue seems to become more likely, the longer the kvm
>> instance runs.
>>
>> We did not get such reports while we were providing Linux 5.4 and QEMU
>> 5.2.0, but we do with Linux 5.11/5.13/5.15 and QEMU 6.x.
>>
>> I'm just wondering if anybody has seen this issue before or might have a
>> hunch what it's about? Any tips on what to look out for when debugging
>> are also greatly appreciated!
>>
>> We do have debug access to a user's test VM and the VM state was saved
>> before a problematic reboot, but I can't modify the host system there.
>> AFAICT QEMU just executes guest code as usual, but I'm really not sure
>> what to look out for.
>>
>> That VM has CPU type host, and a colleague did have a similar enough CPU
>> to load the VM state, but for him, the reboot went through normally. On
>> the user's system, it triggers consistently after loading the VM state
>> and rebooting.
>>
>> So unfortunately, we didn't manage to reproduce the issue locally yet.
>> With two other images provided by users, we ran into a boot loop, where
>> QEMU resets the CPUs and does a few KVM_RUNs before the exit reason is
>> KVM_EXIT_SHUTDOWN (which to my understanding indicates a triple fa
>> ult)
>> and then it repeats. It's not clear if the issues are related.
>
>
> Does the guest have HyperV enabled in it (that is nested virtualization?)
>
For all three machines described above
Get-WindowsOptionalFeature -Online -FeatureName Microsoft-Hyper-V
indicates that HyperV is disabled.
> Intel or AMD?
>
We do have reports for both Intel and AMD.
> Does the VM uses secure boot / SMM?
>
The customer VM which can reliably trigger the issue after loading the
state and rebooting uses SeaBIOS. For the other two VMs,
Confirm-SecureBootUEFI
returns "False".
SMM might be a lead! We did disable SMM in the past, because apparently
there were problems with it (didn't dig out which, was before I worked
here), and the timing of enabling it and the reports coming in would
match. I guess (some) guest OSes don't expect it to be suddenly turned on?
However, there is a report of a user with two clusters with QEMU 5.2,
one with kernel 5.4 without the issue and one with kernel 5.11 with the
issue (Windows VM with spinning circles). So that's confusing :/
We do use some additional options if the OS type is "Windows" in our
high-level configuration, including hyperV enlightenments:
> -cpu
> 'host,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt'
> -no-hpet
> -rtc 'driftfix=slew,base=localtime'
> -global 'kvm-pit.lost_tick_policy=discard'
But one user reported running into the issue even with OS type "other",
i.e. when the above options are not present and CPU flags should be just
'+kvm_pv_eoi,+kvm_pv_unhalt'. There are also reports with CPU type
different from 'host', also with 'kvm64' (where we automatically set the
flags +lahf_lm,+sep).
Thank you and Best Regards,
Fiona
P.S. Please don't mind the (from your perspective sudden) name change.
I'm still the same person and don't intend to change it again :)
> Best regards,
> Maxim Levitsky
>
>>
>> There are also a few reports about non-Windows VMs, mostly Ubuntu 20.04
>> with UEFI/OVMF, but again, it's not clear if the issues are related.
>>
>> [0]: https://forum.proxmox.com/threads/100744/
>> (the forum thread is a bit chaotic unfortunately).
>>
>> Best Regards,
>> Fabi
>>
>>
>
>
>