qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 2/3] hw/i386: Add a new check to configure smp dies for EP


From: Igor Mammedov
Subject: Re: [PATCH v3 2/3] hw/i386: Add a new check to configure smp dies for EPYC
Date: Thu, 13 Aug 2020 15:56:10 +0200

On Tue, 11 Aug 2020 16:03:58 -0500
Babu Moger <babu.moger@amd.com> wrote:

> On 8/7/20 2:11 PM, Igor Mammedov wrote:
> > On Fri, 7 Aug 2020 17:52:22 +0100
> > Daniel P. Berrangé <berrange@redhat.com> wrote:
> >   
> >> On Fri, Aug 07, 2020 at 11:32:51AM -0500, Babu Moger wrote:  
> >>> Adding a new check to warn the users to configure 'dies' when
> >>> topology is numa configured. It makes it easy to build the
> >>> topology for EPYC models.    
> >>
> >> This says you're adding a warning....
> >>  
> >>>
> >>> Signed-off-by: Babu Moger <babu.moger@amd.com>
> >>> ---
> >>>  hw/i386/x86.c |    7 +++++++
> >>>  1 file changed, 7 insertions(+)
> >>>
> >>> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> >>> index 67bee1bcb8..2a6ce56ef1 100644
> >>> --- a/hw/i386/x86.c
> >>> +++ b/hw/i386/x86.c
> >>> @@ -138,6 +138,13 @@ void x86_cpus_init(X86MachineState *x86ms, int 
> >>> default_cpu_version)
> >>>  
> >>>      /* Check for apicid encoding */
> >>>      if (cpu_x86_use_epyc_apic_id_encoding(ms->cpu_type)) {
> >>> +        if ((ms->numa_state->num_nodes > 0) &&
> >>> +            ms->numa_state->num_nodes != (ms->smp.sockets * 
> >>> x86ms->smp_dies)) {
> >>> +            error_setg(&error_fatal, "Numa configuration requires smp 
> >>> 'dies' "
> >>> +                       "parameter. Configure the cpu topology properly 
> >>> with "
> >>> +                       "max_cpus = sockets * dies * cores * threads");   
> >>>  
> >>
> >> ...but you're actually making this a fatal error, not a warning.
> >>
> >> I'm not sure this is really OK. Wouldn't this mean that existing VMs
> >> deployed today, risk triggering this fatal error next time they
> >> are booted, or live migrated.  If it is possible someone is using
> >> such a config today, I don't think we can break it.  
> > 
> > to begin with, users shouldn't have used 'dies' with initial impl. at all.
> > (it was Intel introduced option and EPYC's added very similar internal 
> > node_id
> > (removed by the next patch)).
> > Now we are trying to consolidate this mess and reuse dies for EPYC.
> > 
> > EPYC was out in the since with 5.0 (though broken), users could start a VM 
> > with
> > such config but that would not be correct EPYC from apicid and cpuid point 
> > of view.
> > Guest OS might run if it doesn't know about EPYCs or behave wierdly (sub 
> > optimal|crash|whatever)
> > on seeing unexpected values.
> > 
> > If we are hell bound on keeping bugs of initial impl, then we should keep 
> > it to 5.1<=
> > machine version and do the right thing for newer ones.
> > Though I'm not sure we should keep broken variant around (all we would get 
> > from it is
> > bug reports*/complains from users with end result of their config anyways).
> > I'd rather error out with clear error message so user could fix their 
> > broken config.
> > 
> > *) there is at least one thread/bz on qemu-devel where users are trying to 
> > run
> > with EPYC and pick up options combination so it would produce sensible 
> > topology.  
> 
> 
> I am still not sure what is the right approach here.  I can think of
> couple of options.
> 1. If smp_dies != num_nodes then go ahead create the configuration with as
>  many smp_dies and warn(but not error out) users about the mis-configuration.
warning is a bad idea, that usually leads to troubles down the road.

Provided that code is relatively new and produces misconfigured CPUs
and if nobody insists on keeping bug around, I'd try to go for erroring out.
Yes that would break misconfigured configs but that could be fixed by
reconfiguring on user side.

> 2. Introduce it as a fix based on  machine version(5.1 >) like Igor
> mentioned. I am not sure how to achieve that. I can look into that.
That's a headache for maintaing point of view, so again if nobody insist
I'd rather avoid it.

> 
> Thanks
> Babu
> 
> > 
> >   
> >> Regards,
> >> Daniel  
> >   
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]