qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] [Qemu-devel] ARM64 STR Instruction Crash Regression in TC


From: Richard Henderson
Subject: Re: [Qemu-arm] [Qemu-devel] ARM64 STR Instruction Crash Regression in TCG
Date: Sun, 22 Jul 2018 18:45:53 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 07/22/2018 02:31 PM, Richard Henderson wrote:
> On 07/22/2018 01:47 PM, Jason A. Donenfeld wrote:
>> Hello,
>>
>> Gcc 7.3 compiles bash's array_flush's dual assignment using:
>>
>> STP             X20, X20, [X20,#0x10]
>>
>> But gcc 8.1 compiles it as:
>>
>> STR             Q0, [X20,#0x10]
>>
>> Real processors seem okay, and qemu 2.11 seems okay. But qemu 2.12
>> results in a segfaulting process. I'm pretty sure this is a TCG bug.
>>
>> In the attached tarball, please find kernel and run.sh. Calling
>> ./run.sh will start the kernel with the bad bash executable that tries
>> to execute `config=({1..100000})` and crashes. Also included in there
>> is the actual crashing bash binary, in case you'd like to disassemble
>> a little bit.
> 
> Interesting.  The test passes on master with --enable-debug, but fails when
> qemu is compiled with optimization...
> 
> I'll dig a bit deeper.

The failing sequence is

0x0045ba44:  4e080e80  dup      v0.2d, x20
0x0045ba48:  90000340  adrp     x0, #0x4c3000
0x0045ba4c:  91098003  add      x3, x0, #0x260
0x0045ba50:  92800001  movn     x1, #0
0x0045ba54:  f9413002  ldr      x2, [x0, #0x260]
0x0045ba58:  3d800680  str      q0, [x20, #0x10]
...

OP after optimization and liveness analysis:
 ld_i32 tmp0,env,$0xffffffffffffffdc              dead: 1
 movi_i32 tmp1,$0x0
 brcond_i32 tmp0,tmp1,lt,$L0                      dead: 0 1

 ---- 000000000045ba44 0000000000000000 0000000000000000
 dup_vec v128,e64,tmp2,x20
 st_vec v128,e8,tmp2,env,$0x8c0                   dead: 0

...

 ---- 000000000045ba58 0000000000000000 0000000000000000
 movi_i64 tmp4,$0x10
 add_i64 tmp3,x20,tmp4                            dead: 1 2
 ld_i64 tmp4,env,$0x8c0
 movi_i64 tmp6,$0x8
 add_i64 tmp5,tmp3,tmp6                           dead: 2
 qemu_st_i64 tmp4,tmp3,leq,0                      dead: 0 1
 ld_i64 tmp4,env,$0x8c8                           dead: 1
 qemu_st_i64 tmp4,tmp5,leq,0                      dead: 0 1
...

0x7fffcd2e678c:  vmovq    0xe0(%r14), %xmm0
0x7fffcd2e6795:  vpbroadcastq %xmm0, %xmm1
0x7fffcd2e679a:  vmovdqu  %xmm1, 0x8c0(%r14)
...
0x7fffcd2c0e78:  vmovq    %xmm0, %r12
0x7fffcd2c0e7d:  addq     $0x10, %r12


The guest x20 is loaded in to xmm0 for the dup at 0x45ba44, and was reused for
the store at 0x45ba58.  However, if the load at 0x45ba54 misses the TLB, then
we will have a function call, which can clobber xmm0.

With -O0, it just so happens that the function call does not clobber xmm0; with
optimization enabled, the compiler's different code generation does clobber 
xmm0.

Fix by properly considering xmm registers to be call-clobbered.  At which point
the saved value is evicted from xmm0 naturally.  Patch posted separately.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]