Re: [Qemu-devel] E5-2620v2 - emulation stop error

From: Bandan Das
Subject: Re: [Qemu-devel] E5-2620v2 - emulation stop error
Date: Tue, 10 Mar 2015 22:38:57 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux)

"Dr. David Alan Gilbert" <address@hidden> writes:

> * Paolo Bonzini (address@hidden) wrote:
>> On 10/03/2015 19:21, Bandan Das wrote:
>> > Paolo Bonzini <address@hidden> writes:
>> > 
>> >> On 10/03/2015 17:57, Dr. David Alan Gilbert wrote:
>> >>> I'm seeing something similar; it's very intermittent and generally
>> >>> happening right at boot of the guest;   I'm running this on qemu
>> >>> head+my postcopy world (but it's happening right at boot before postcopy
>> >>> gets a chance), and I'm using a 3.19ish kernel. Xeon E5-2407 in my case
>> >>> but hey maybe I'm seeing a different bug.
>> > 
>> > Probably a tangent but is the qemu trace identical to what Andrey is 
>> > seeing ?
>> > From a cursory look and my limited understanding, it seems his failure is 
>> > #GP
>> > when executing video bios.
>> > 
>> >> Same here on 3.16 + Xeon E5 v3 kernel.
>> > 
>> > I will try to reproduce this on a v2.
>> I see several failures, usually mine have suberror 1.  With a 32-VCPU
>> guest I can reproduce it roughly half of the time.
>> Paolo
> while true; do (sleep 5; echo -e 
> '\001cq\n')|/opt/qemu-try-world3/bin/qemu-system-x86_64 -machine 
> pc-i440fx-2.0,accel=kvm -m 1024 -smp 128 -nographic -device sga 2>&1 | tee 
> /tmp/qemu.op; grep "internal error" /tmp/qemu.op -q && break; done
> (and leave about 2mins of runs before declaring good)
> bad: cd2946607b42636d6c8cf6dbf94bce0273507b17
> bad: 041ccc922ee474693a2869d4e3b59e920c739bc0
> bad: 2559db069628981bfdc90637fac5bf1b4f4e8ef5
> bad: 21f5826a04d38e19488f917e1eef22751490c769
> good:e95d24ff40c77fbfd71396834a2eb99375f8bcc4
> good: 7781a492fa5a2eff53d06b25b93f0186ad3226c9
> good: c3edd62851098e6417786193ed9e9341781fcf57
> good: c5c6d7f81a6950d8e32a3b5a0bafd37bfa5a8e88
> good: 73104fd399c6778112f64fe0d439319f24508d9a
> good: 92013cf8ca10adafec9a92deb5df993e7df22cb9
> good: 4478aa768ccefcc5b234c23d035435fd71b932f6
> good: 2.2.0
> address@hidden qemu-world3]# git bisect bad
> 21f5826a04d38e19488f917e1eef22751490c769 is the first bad commit

I can reproduce this on E5-2620 v2 with  David's "while true" test.
(The emulation failure I mean, not the suberror 2 that Andrey is seeing)
The commit that seems to have introduced this is -

commit 0673b7870063a3affbad9046fb6d385a4e734c19
Author: Kevin O'Connor <address@hidden>
Date:   Sat May 24 10:49:50 2014 -0400

    smp: Replace QEMU SMP init assembler code with C; run only in 32bit mode.
    Change the multi-processor init code to trampoline into 32bit mode on
    each of the additional processors.  Implement an atomic lock so that
    each processor performs its initialization serially.

I am not sure what in that change could cause this though..
Also, in my testing, "unrestricted_guest=0" avoids the failure.

