qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 1/2] spapr: Report correct GTSE support via ov5


From: Fabiano Rosas
Subject: Re: [RFC PATCH 1/2] spapr: Report correct GTSE support via ov5
Date: Fri, 11 Mar 2022 11:45:44 -0300

Daniel Henrique Barboza <danielhb413@gmail.com> writes:

> On 3/8/22 22:23, Fabiano Rosas wrote:
>> QEMU reports MMU support to the guest via the ibm,architecture-vec-5
>> property of the /chosen node. Byte number 26 specifies Radix Table
>> Expansions, currently only GTSE (Guest Translation Shootdown
>> Enable). This feature determines whether the tlbie instruction (and
>> others) are HV privileged.
>> 
>> Up until now, we always reported GTSE=1 to guests. Even after the
>> support for GTSE=0 was added. As part of that support, a kernel
>> command line radix_hcall_invalidate=on was introduced that overrides
>> the GTSE value received via CAS. So a guest can run with GTSE=0 and
>> use the H_RPT_INVALIDATE hcall instead of tlbie.
>> 
>> In this scenario, having GTSE always set to 1 by QEMU leads to a crash
>> when running nested KVM guests because KVM does not allow a nested
>> hypervisor to set GTSE support for its nested guests. So a nested
>> guest always uses the same value for LPCR_GTSE as its HV. Since the
>> nested HV disabled GTSE, but the L2 QEMU always reports GTSE=1, we run
>> into a crash when:
>> 
>> L1 LPCR_GTSE=0
>> L2 LPCR_GTSE=0
>> L2 CAS GTSE=1
>> 
>> The nested guest will run 'tlbie' and crash because the HW looks at
>> LPCR_GTSE, which is clear.
>> 
>> Having GTSE disabled in the L1 and enabled in the L2 is not an option
>> because the whole purpose of GTSE is to disallow access to tlbie and
>> we cannot allow L1 to spawn L2s that can access features that L1
>> itself cannot.
>> 
>> We also cannot have the guest check the LPCR bit, because LPCR is
>> HV-privileged.
>> 
>> So this patch goes through the most intuitive route which is to have
>> QEMU ask KVM about GTSE support and advertise the correct value to the
>> guest. A new KVM_CAP_PPC_GTSE capability is being added.
>> 
>> TCG continues to always enable GTSE.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
>> ---
>
>
> I'm not sure if I fully understood the situation, so let me recap. Once upon 
> a time,
> QEMU advertised GTSE=1 and the host would never advertise other value, and 
> everyone
> was happy.
>
> The host started to support GTSE=0, but QEMU kept advertising GTSE=1 
> regardless, and no
> KVM GTSE cap was added to reflect the host support. I'll then assume that:
>
>
> - all guests would break if running in a GTSE=0 host prior to rpt_invalidate 
> support (which
> is necessary to allow the guest to run in GTSE=0)
>
> - apparently no one ever tried to run a KVM guest in a GTSE=0 host, so no 
> bugs were opened

There's a slight misconception in the above statements which is the
separation of QEMU vs. the host. GTSE is advertised via CAS, so the
guest on one side and the HV on the other. QEMU is not merely
advertising what the host GTSE value is, QEMU *is* the host.

Now, of course we could have done this in a way that QEMU asked the
kernel what GTSE value to use, but since we always thought of GTSE as
required for Radix, that was would have been useless. No HV ever
reported GTSE=0 via CAS, either PowerVM or QEMU/KVM, so having the value
hardcoded in QEMU and in the kernel was never an issue.

> After commit 82123b756a1a2f1 (target/ppc: Support for H_RPT_INVALIDATE hcall) 
> we added
> cap-rpt-invalidate. I didn't follow the discussions of this cap but it seems 
> like, as with
> almost every other cap we have, there would be a migration problem for a 
> guest that was in
> a rpt_invalidate aware host to migrate to another where this wouldn't be 
> true, and the cap
> solves that.

Yes, cap-rpt-invalidate works just as we would expect. When I mentioned
to you in private about migration I meant the kernel-side change:

https://git.kernel.org/torvalds/c/bf6b7661f41615

What that change does is add a kernel cmdline option to allow the kernel
to disable GTSE even when running along with an HV that allows GTSE.

> What I'm not following is why, even after having cap-rpt-invalidate, we are 
> still "lying"
> about the GTSE=1 regardless of what the host supports. We could've added the 
> GTSE KVM cap
> at the same time rpt_invalidate was introduced, and guests that want to 
> ignore this setting
> can use the cap to bypass it.

We're still reporting GTSE=1 because that is a design decision from
Linux/KVM. The work to support GTSE=0 was just adding the support, not
deciding whether we should disable GTSE. So QEMU/kernel kept hardcoding
the value without issue.

> In the end this patch is a needed fix IMHO. My confusion is why we're doing 
> this just now.

The bug only surfaces when we run an L1 guest that decided to disable
GTSE via kernel cmdline and a nested guest on top of it. The QEMU inside
the L1 continues to force GTSE=1 as always. That is why the capability
now seem so compelling when previously it might have not.

> The patch itself LGTM.

Unfortunately, this patch as it is cannot work. We always ran with
GTSE=1 so any kernel that does not know about CAP_GTSE will report
GTSE=0 and break any guest that is older than the initial H_RPT
enablement. And the trick of checking for cap-rpt-invalidate first does
not always work because there's a window between when that cap was added
and now.

So what I am going to do is to change the kernel side to always report
values different than 0 so that QEMU can use the 0 value to
unequivocally tell older kernels apart from ones that disable the
feature. That way we will continue to send GTSE=1 via CAS when KVM is
too old.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]