[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 0/2] target/i386/kvm: fix two svm pmu virtualization bugs
From: |
Dongli Zhang |
Subject: |
Re: [PATCH v2 0/2] target/i386/kvm: fix two svm pmu virtualization bugs |
Date: |
Sun, 8 Jan 2023 17:19:50 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 |
Ping?
About [PATCH v2 2/2], the bad thing is that the customer will not be able to
notice the issue, that is, the "Broken BIOS detected" in dmesg, immediately.
As a result, the customer VM many panic randomly anytime in the future (once
issue is encountered) if "/proc/sys/kernel/unknown_nmi_panic" is enabled.
Thank you very much!
Dongli Zhang
On 12/19/22 06:45, Dongli Zhang wrote:
> Can I get feedback for this patchset, especially the [PATCH v2 2/2]?
>
> About the [PATCH v2 2/2], currently the issue impacts the usage of PMUs on AMD
> VM, especially the below case:
>
> 1. Enable panic on nmi.
> 2. Use perf to monitor the performance of VM. Although without a test, I think
> the nmi watchdog has the same effect.
> 3. A sudden system reset, or a kernel panic (kdump/kexec).
> 4. After reboot, there will be random unknown NMI.
> 5. Unfortunately, the "panic on nmi" may panic the VM randomly at any time.
>
> Thank you very much!
>
> Dongli Zhang
>
> On 12/1/22 16:22, Dongli Zhang wrote:
>> This patchset is to fix two svm pmu virtualization bugs, x86 only.
>>
>> version 1:
>> https://lore.kernel.org/all/20221119122901.2469-1-dongli.zhang@oracle.com/
>>
>> 1. The 1st bug is that "-cpu,-pmu" cannot disable svm pmu virtualization.
>>
>> To use "-cpu EPYC" or "-cpu host,-pmu" cannot disable the pmu
>> virtualization. There is still below at the VM linux side ...
>>
>> [ 0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
>>
>> ... although we expect something like below.
>>
>> [ 0.596381] Performance Events: PMU not available due to virtualization,
>> using software events only.
>> [ 0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
>>
>> The 1st patch has introduced a new x86 only accel/kvm property
>> "pmu-cap-disabled=true" to disable the pmu virtualization via
>> KVM_PMU_CAP_DISABLE.
>>
>> I considered 'KVM_X86_SET_MSR_FILTER' initially before patchset v1.
>> Since both KVM_X86_SET_MSR_FILTER and KVM_PMU_CAP_DISABLE are VM ioctl. I
>> finally used the latter because it is easier to use.
>>
>>
>> 2. The 2nd bug is that un-reclaimed perf events (after QEMU system_reset)
>> at the KVM side may inject random unwanted/unknown NMIs to the VM.
>>
>> The svm pmu registers are not reset during QEMU system_reset.
>>
>> (1). The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
>> is running "perf top". The pmu registers are not disabled gracefully.
>>
>> (2). Although the x86_cpu_reset() resets many registers to zero, the
>> kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
>> some pmu events are still enabled at the KVM side.
>>
>> (3). The KVM pmc_speculative_in_use() always returns true so that the events
>> will not be reclaimed. The kvm_pmc->perf_event is still active.
>>
>> (4). After the reboot, the VM kernel reports below error:
>>
>> [ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS
>> detected, complain to your hardware vendor.
>> [ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR
>> c0010200 is 530076)
>>
>> (5). In a worse case, the active kvm_pmc->perf_event is still able to
>> inject unknown NMIs randomly to the VM kernel.
>>
>> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
>>
>> The 2nd patch is to fix the issue by resetting AMD pmu registers as well as
>> Intel registers.
>>
>>
>> This patchset does not cover PerfMonV2, until the below patchset is merged
>> into the KVM side.
>>
>> [PATCH v3 0/8] KVM: x86: Add AMD Guest PerfMonV2 PMU support
>> https://lore.kernel.org/all/20221111102645.82001-1-likexu@tencent.com/
>>
>>
>> Dongli Zhang (2):
>> target/i386/kvm: introduce 'pmu-cap-disabled' to set
>> KVM_PMU_CAP_DISABLE
>> target/i386/kvm: get and put AMD pmu registers
>>
>> accel/kvm/kvm-all.c | 1 +
>> include/sysemu/kvm_int.h | 1 +
>> qemu-options.hx | 7 +++
>> target/i386/cpu.h | 5 ++
>> target/i386/kvm/kvm.c | 129 +++++++++++++++++++++++++++++++++++++++++-
>> 5 files changed, 141 insertions(+), 2 deletions(-)
>>
>> Thank you very much!
>>
>> Dongli Zhang
>>
>>
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [PATCH v2 0/2] target/i386/kvm: fix two svm pmu virtualization bugs,
Dongli Zhang <=