qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SM


From: Laszlo Ersek
Subject: Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature
Date: Wed, 18 Jan 2017 11:23:48 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0

On 01/18/17 11:19, Laszlo Ersek wrote:
> On 01/18/17 11:03, Igor Mammedov wrote:
>> On Tue, 17 Jan 2017 19:53:21 +0100
>> Laszlo Ersek <address@hidden> wrote:
>>

[snip]

>>> This is the code I wrote (extracting a new ich9_apm_broadcast_smi()
>>> function from ich9_apm_ctrl_changed(), due to size reasons):
>>>
>>>> static void ich9_apm_broadcast_smi(void)
>>>> {
>>>>     CPUState *cs;
>>>>
>>>>     cpu_synchronize_all_states(); /* <--------- mark this */
>> it probably should be executed after pause_all_vcpus(),
>> maybe it will help.
>>
>>>>     CPU_FOREACH(cs) {
>>>>         X86CPU *cpu = X86_CPU(cs);
>>>>         CPUX86State *env = &cpu->env;
>>>>
>>>>         if (env->smbase == 0x30000 && env->eip == 0xfff0) {
>>>>             CPUClass *k = CPU_GET_CLASS(cs);
>>>>             uint64_t cpu_arch_id = k->get_arch_id(cs);
>>>>
>>>>             trace_ich9_apm_broadcast_smi_skip(cpu_arch_id);
>>>>             continue;
>>>>         }
>>>>
>>>>         cpu_interrupt(cs, CPU_INTERRUPT_SMI);
>>>>     }
>>>> }  
>>>
>> [...]
>>> (b) If I add the cpu_synchronize_all_states() call before the loop, then
>>> the incorrect filter matches go away; QEMU sees the KVM VCPU states
>>> correctly, and the SMI is broad-cast.
>>>
>>> However, in this case, the boot slows to a *crawl*. It's unbearable. All
>>> VCPUs are pegged at 100%, and you can easily follow the OVMF debug log
>>> with the naked eye, almost line by line.
>>> I didn't expect that cpu_synchronize_all_states() would be so costly,
>>> but it makes the idea of checking VCPU state in the APM_CNT handler a
>>> non-starter.
>> I wonder were bottleneck is to slow down guest so much.
>> What is actual cost of calling cpu_synchronize_all_states()?
>>
>> Maybe calling pause_all_vcpus() before cpu_synchronize_all_states()
>> would help.
> 
> If I prepend just a pause_all_vcpus() function call, at the top, then
> the VM freezes (quite understandably) when the first SMI is raised via
> APM_CNT.
> 
> If I, instead, bracket the function body with pause_all_vcpus() and
> resume_all_vcpus(), like this:
> 
> static void ich9_apm_broadcast_smi(void)
> {
>     CPUState *cs;
> 
>     pause_all_vcpus();
>     cpu_synchronize_all_states();
>     CPU_FOREACH(cs) {
>         X86CPU *cpu = X86_CPU(cs);
>         CPUX86State *env = &cpu->env;
> 
>         if (env->smbase == 0x30000 && env->eip == 0xfff0) {
>             CPUClass *k = CPU_GET_CLASS(cs);
>             uint64_t cpu_arch_id = k->get_arch_id(cs);
> 
>             trace_ich9_apm_broadcast_smi_skip(cpu_arch_id);
>             continue;
>         }
> 
>         cpu_interrupt(cs, CPU_INTERRUPT_SMI);
>     }
>     resume_all_vcpus();
> }
> 
> then I see the following symptoms:
> - rather than 4 VCPUs being pegged at 100%, only one VCPU is pegged at
>   100%
> - the boot process, as observed from the OVMF log, is just as slow
>   (= crawling) as without the VCPU pausing/resuming (i.e., like with
>   cpu_synchronize_all_states() only); so ultimately the
>   pausing/resuming doesn't help.

I also tried this variant (bracketing only the sync call with pause/resume):

static void ich9_apm_broadcast_smi(void)
{
    CPUState *cs;

    pause_all_vcpus();
    cpu_synchronize_all_states();
    resume_all_vcpus();
    CPU_FOREACH(cs) {
        X86CPU *cpu = X86_CPU(cs);
        CPUX86State *env = &cpu->env;

        if (env->smbase == 0x30000 && env->eip == 0xfff0) {
            CPUClass *k = CPU_GET_CLASS(cs);
            uint64_t cpu_arch_id = k->get_arch_id(cs);

            trace_ich9_apm_broadcast_smi_skip(cpu_arch_id);
            continue;
        }

        cpu_interrupt(cs, CPU_INTERRUPT_SMI);
    }
}

This behaves identically to the version where pause/resume bracket the
entire function body (i.e., 1 VCPU pegged, super-slow boot progress).

Laszlo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]