qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v1] RISC-V: RISC-V TCG backend work in progress


From: Richard Henderson
Subject: Re: [Qemu-devel] [PATCH v1] RISC-V: RISC-V TCG backend work in progress
Date: Tue, 27 Mar 2018 11:36:36 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

On 03/27/2018 08:26 AM, Michael Clark wrote:
> IN:
> 0x0000000000011138:  800002b7          lui             t0,-2147483648
> 0x000000000001113c:  ffff8337          lui             t1,-32768
> 0x0000000000011140:  0262a433          mulhsu          s0,t0,t1
> 0x0000000000011144:  800004b7          lui             s1,-2147483648
> 0x0000000000011148:  00700193          addi            gp,zero,7
> 0x000000000001114c:  f49412e3          bne             s0,s1,-188

At last something interesting.

-2147483648 * 18446744073709518848 (0xffff_ffff_ffff_8000)
= -39614081257132098428027797504
= -7FFF_FFFF_FFFF_C000_0000_0000
=  ffff_ffff_8000_0000 0000_0400_0000_0000


> OP after optimization and liveness analysis:
>  ld_i32 tmp0,env,$0xffffffffffffffec              dead: 1
>  movi_i32 tmp1,$0x0
>  brcond_i32 tmp0,tmp1,lt,$L0                      dead: 0 1
> 
>  ---- 0000000000011138
>  movi_i64 t0  ,$0xffffffff80000000                sync: 0  dead: 0
> 
>  ---- 000000000001113c
>  movi_i64 t1  ,$0xffffffffffff8000                sync: 0  dead: 0
> 
>  ---- 0000000000011140
>  movi_i64 tmp2,$0x80000000
>  mov_i64 s0  ,tmp2                                sync: 0  dead: 0 1
> 
>  ---- 0000000000011144
>  movi_i64 s1  ,$0xffffffff80000000                sync: 0  dead: 0

So, yes, your test is correct, because here we load that high-part.  But
crucially, the multiply has produced a result *without* the sign-extension, so
the test fails.

It would have been handy to see the opcodes emitted before optimization, so
that we can verify what the riscv front-end did.  However, it's not that hard
to regenerate.

Annoyingly, the tcg optimizer hasn't been taught to fold things the same way
when the backend supports mulu2, so we *don't* fold the test away for an x86_64
host.

 mulu2_i64 tmp4,tmp5,t0  ,t1                      dead: 0 2 3
 movi_i64 tmp4,$0xffffffffffff8000
 sub_i64 tmp2,tmp5,tmp4                           dead: 1 2
 mov_i64 s0  ,tmp2                                sync: 0  dead: 0 1

And thus no doubt the test succeeds for that case.
So lemme try again with an aarch64 host.

 movi_i64 tmp2,$0x80000000
 mov_i64 s0  ,tmp2                                sync: 0  dead: 0 1

Yay, reproduction.  The input to the optimization is

 mov_i64 tmp2,t0
 mov_i64 tmp3,t1
 mul_i64 tmp6,tmp2,tmp3
 muluh_i64 tmp5,tmp2,tmp3
 mov_i64 tmp4,tmp6
 movi_i64 tmp6,$0x3f
 sar_i64 tmp4,tmp2,tmp6
 and_i64 tmp4,tmp4,tmp3
 sub_i64 tmp2,tmp5,tmp4
 mov_i64 s0  ,tmp2

Now, dead-code and forward propagate constants by hand,

 movi_i64 tmp2,$0xffffffff80000000
 movi_i64 tmp3,$0xffffffffffff8000
 muluh_i64 tmp5,tmp2,tmp3
 sub_i64 tmp2,tmp5,$0xffffffffffff8000
 mov_i64 s0  ,tmp2

Now, for the unsigned multiplication we get

18446744071562067968 * 18446744073709518848
= 340282366881323777743332701689153060864
= FFFF_FFFF_7FFF_8000 0000_4000_0000_0000

....

Oops, I see the bug right away now.

Value returned is $3 = 18446744071562035200
(gdb) n
410         if (!(def->flags & TCG_OPF_64BIT)) {
411             res = (int32_t)res;

Failure to mark muluh_i64 and mulsh_i64 as 64-bit ops, so we've discarded the
high 32 bits of the result.

I'm surprised that we haven't seen this as a problem before.  Perhaps luck in
non-optimization; we need a test case this small (but no smaller, like RISU) to
be able to produce a bad result.

So.  Two line patch to follow.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]