qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 16/21] i386: track explicit 'hv-*' features enablement/dis


From: Igor Mammedov
Subject: Re: [PATCH v4 16/21] i386: track explicit 'hv-*' features enablement/disablement
Date: Wed, 24 Feb 2021 17:06:22 +0100

On Tue, 23 Feb 2021 19:08:42 +0100
Vitaly Kuznetsov <vkuznets@redhat.com> wrote:

> Igor Mammedov <imammedo@redhat.com> writes:
> 
> > On Tue, 23 Feb 2021 16:46:50 +0100
> > Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> >  
> >> Igor Mammedov <imammedo@redhat.com> writes:
> >>   
> >> > On Mon, 22 Feb 2021 11:20:34 +0100
> >> > Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
> >> >    
> >> >> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> >> >>     
> >> >> > Igor Mammedov <imammedo@redhat.com> writes:
> >> >> >      
> >> >> >>> 
> >> >> >>> We need to distinguish because that would be sane.
> >> >> >>> 
> >> >> >>> Enlightened VMCS is an extension to VMX, it can't be used without
> >> >> >>> it. Genuine Hyper-V doesn't have a knob for enabling and disabling 
> >> >> >>> it,      
> >> >> >> ...      
> >> >> >>> That bein said, if
> >> >> >>> guest CPU lacks VMX it is counter-productive to expose EVMCS. 
> >> >> >>> However,
> >> >> >>> there is a problem with explicit enablement: what should
> >> >> >>> 
> >> >> >>> 'hv-passthrough,hv-evmcs' option do? Just silently drop EVMCS? 
> >> >> >>> Doesn't
> >> >> >>> sound sane to me.      
> >> >> >> based on above I'd error out is user asks for unsupported option
> >> >> >> i.e. no VMX -> no hv-evmcs - if explicitly asked -> error out      
> >> >> >
> >> >> > That's what I keep telling you but you don't seem to listen. 'Scratch
> >> >> > CPU' can't possibly help with this use-case because when you parse 
> >> >> >
> >> >> > 'hv-passthrough,hv-evmcs,vmx=off' you
> >> >> >
> >> >> > 1) "hv-passthrough" -> set EVMCS bit to '1' as it is supported by the
> >> >> > host.
> >> >> >
> >> >> > 2) 'hv-evmcs' -> keep EVMCS bit '1'
> >> >> >
> >> >> > 3) 'vmx=off' -> you have no idea where EVMCS bit came from.
> >> >> >
> >> >> > We have to remember which options were aquired from the host and which
> >> >> > were set explicitly by the user.       
> >> >> 
> >> >> Igor,
> >> >> 
> >> >> could you please comment on the above? In case my line of thought is
> >> >> correct, and it is impossible to distinguish between e.g.
> >> >> 
> >> >> 'hv-passthrough,hv-evmcs,-vmx'
> >> >> and
> >> >> 'hv-passthrough,-vmx'
> >> >> 
> >> >> without a custom parser (written just exactly the way I did in this
> >> >> version, for example) regardless of when 'hv-passthrough' is
> >> >> expanded. E.g. we have the exact same problem with
> >> >> 'hv-default,hv-evmcs,-vmx'. I that case I see no point in discussing    
> >> >
> >> > right, if we need to distinguish between explicit and implicit hv-evmcs 
> >> > set by
> >> > hv-passthrough custom parser probably the way to go.
> >> >
> >> > However do we need actually need to do it?    
> >> 
> >> I think we really need that. See below ...
> >>   
> >> > I'd treat 'hv-passthrough,-vmx' the same way as 
> >> > 'hv-passthrough,hv-evmcs,-vmx'
> >> > and it applies not only hv-evmcs but other features hv-passthrough might 
> >> > set
> >> > (i.e. if whatever was [un]set by hv-passthrough in combination with other
> >> > features results in invalid config, QEMU shall error out instead of 
> >> > magically
> >> > altering host provided hv-passthrough value).
> >> >
> >> > something like:
> >> >   'hv-passthrough,-vmx' when hv-passthrough makes hv-evmcs bit set
> >> > should result in
> >> >   error_setg(errp,"'vmx' feature can't be disabled when hv-evmcs is 
> >> > enabled,"
> >> >                  " either enable 'vmx' or disable 'hv-evmcs' along with 
> >> > disabling 'vmx'"
> >> >
> >> > making host's features set, *magically* mutable, depending on other user 
> >> > provided features
> >> > is a bit confusing. One would never know what hv-passthrough actually 
> >> > means, and if
> >> > enabling/disabling 'random' feature changes it.
> >> >
> >> > It's cleaner to do just what user asked (whether implicitly or 
> >> > explicitly) and error out
> >> > in case it ends up in nonsense configuration.
> >> >    
> >> 
> >> I don't seem to agree this is a sane behavior, especially if you replace
> >> 'hv-passthrough' with 'hv-default' above. Removing 'vmx' from CPU for
> >> Windows guests is common if you'd want to avoid nested configuration:
> >> even without any Hyper-V guests created, Windows itself is a Hyper-V
> >> partition.
> >> 
> >> So a sane user will do:
> >> 
> >> '-cpu host,hv-default,vmx=off' 
> >> 
> >> and on Intel he will get an error, and on AMD he won't. 
> >> 
> >> So what you're suggesting actually defeats the whole purpose of
> >> 'hv-default' as upper-layer tools (think libvirt) will need to know that  
> > I'd assume it would be hard for libvirt to use 'hv-default' from migration
> > point of view. It's semi opaque (one can find out what features it sets
> > indirectly inspecting individual hv_foo features, and mgmt will need to
> > know about them). If it will mutate when other features [un]set, upper
> > layers might need to enumerate all these permutations to know which hosts
> > are compatible or compare host feature sets every time before attempting
> > migration.
> 
> That's exactly the opposite of what's the goal here which is: make it
> possible for upper layers to not know anything about Hyper-V
> enlightenments besides 'hv-default'. Migration should work just fine, if
> the rest of guest configuration matches -- then 'hv-default' will create
> the exact same things (e.g. if 'vmx' was disabled on the source it has
            ^^^^^
I'm not convinced in that yet (not with current impl. more on that at the end 
of reply)

> to be enabled on the destination, it can't be different)
> 
> 
> >> Intel configurations for Windows guests are somewhat different. They'll
> >> need to know what 'hv-evmcs' is. We're back to where we've started.  
> >
> > we were talking about hv-passthrough, and if host advertises hv-evmcs
> > QEMU should complain if user disabled features it depends on (
> > not silently fixing up configuration error).
> > But the same applies to hv-default.  
> 
> Let's forget about hv-passthrough completely for a while as this series
> is kind of unrelated to it.

It adds a lot for unrelated code (not just couple of lines),
I've played with scratch CPU idea, here is demo of it
https://github.com/imammedo/qemu/commit/a4b107d5368ebf72d45082bc8310a6b88a4ba6fb
I didn't rework caps/cpuid querying parts (just hacked around it),
and even without that it saves us ~200LOC (not a small part of which comes
with this series).
I also split horrible hv_cpuid_check_and_set into separate 'set' and 'check' 
stages.
Granted it was sort-of pre-existing ugly code, some of your
re-factoring made it a bit better but it's still far from readable.
 
> In the previous submission I was setting 'hv-default' based on host
> availability of the feature only. That is: set on Intel, unset on
> AMD. We have to at least preserve that because it would be insane to
> crash on
> 
> -cpu host,hv-default 
> 
> on AMD because AMD doesn't (and never will!) support hv-evmcs, right?

If QEMU prevents cross arch migration i.e. it's not supported,
then I guess we can make hv-default different depending on AMD or Intel host.
If not then we might need to be conservative i.e. exclude hv-evmcs from 
defaults.

> >> If we are to follow this approach let's just throw away 'hv-evmcs' from
> >> 'hv-default' set, it's going to be much cleaner. But again, I don't
> >> really believe it's the right way to go.  
> >
> > if desired behavior, on Intel host for above config, to start without error
> > then indeed defaults should not set 'hv-evmcs' if it results in invalid
> > feature set.  
> 
> This is problematic as it is still sane for everyone to enable it as it
> gives performance advantage. If we just for a second forget about custom
  "
    > >> So a sane user will do:
    > >> 
    > >> '-cpu host,hv-default,vmx=off'
  "
  it's not easy picking defaults.

> parsers and all that -- which is just an implementation detail, why can't
> we tie 'hv-evmcs' bit in 'hv-default' to 'vxm' 1:1?
migration wise I don't see issues wrt vmx=off turning of hv-evmcs,
however ...

we were replacing user input fixups with hard errors asking
user to fix CLI and removing custom parsers in favor of generic ones.

In vmx=off case we would be fixing up what 'hv-default' explicitly set.
Same applies to other hv-foo set by hv-default.

ex: 'hv-default,hv-dep1=off', will turn off some dependent feature
for other hv feature in hv-default set and it will error out,
same goes on for enabling feature that has dependencies.
Why should we treat hv-evmcs/vmx pair any different?

Granted exiting with error is not the best UX, but at least it says to user
what's wrong with CLI and how to fix it. Also it lets to keep QEMU code
manageable and with consistent behavior.

> Again, the end goal is: make it possible for upper layers to now know
> anything about Hyper-V enlightenments other than 'hv-default'.
I'm still doubtful about feasibility of this goal when migration is considered.
It sure would work if hosts are identical hw/sw wise.
In mixed setup all features, except of hv-evmcs, that included in 'hv-default',
will error out in case host doesn't support it, which should prevent
incompatible migration so that's also fine.

But hv-evmcs will silently go away if host doesn't support it,
which is issue when migration happens to/from host that supports it.

Maybe to help mgmt to figure out hosts compatibility
  1. it should know about hv-evmcs to query it's status
  2. or default value set by 'hv-default' should be exposed to mgmt
     so it could compare whole feature-set in one go without being
     aware of individual features.

Additionally on QEMU side for such conditional features we can
theoretically add a subsection to migration stream when feature
is enabled, that way we at least can prevent 'successful' migration,
when destination value doesn't match. But this might already be
over-engineering on my part.


> Technically, it is possible to make it work.







reply via email to

[Prev in Thread] Current Thread [Next in Thread]