[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v1] RISC-V: RISC-V TCG backend work in progress
From: |
Richard Henderson |
Subject: |
Re: [Qemu-devel] [PATCH v1] RISC-V: RISC-V TCG backend work in progress |
Date: |
Tue, 27 Mar 2018 11:36:36 +0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 |
On 03/27/2018 08:26 AM, Michael Clark wrote:
> IN:
> 0x0000000000011138: 800002b7 lui t0,-2147483648
> 0x000000000001113c: ffff8337 lui t1,-32768
> 0x0000000000011140: 0262a433 mulhsu s0,t0,t1
> 0x0000000000011144: 800004b7 lui s1,-2147483648
> 0x0000000000011148: 00700193 addi gp,zero,7
> 0x000000000001114c: f49412e3 bne s0,s1,-188
At last something interesting.
-2147483648 * 18446744073709518848 (0xffff_ffff_ffff_8000)
= -39614081257132098428027797504
= -7FFF_FFFF_FFFF_C000_0000_0000
= ffff_ffff_8000_0000 0000_0400_0000_0000
> OP after optimization and liveness analysis:
> ld_i32 tmp0,env,$0xffffffffffffffec dead: 1
> movi_i32 tmp1,$0x0
> brcond_i32 tmp0,tmp1,lt,$L0 dead: 0 1
>
> ---- 0000000000011138
> movi_i64 t0 ,$0xffffffff80000000 sync: 0 dead: 0
>
> ---- 000000000001113c
> movi_i64 t1 ,$0xffffffffffff8000 sync: 0 dead: 0
>
> ---- 0000000000011140
> movi_i64 tmp2,$0x80000000
> mov_i64 s0 ,tmp2 sync: 0 dead: 0 1
>
> ---- 0000000000011144
> movi_i64 s1 ,$0xffffffff80000000 sync: 0 dead: 0
So, yes, your test is correct, because here we load that high-part. But
crucially, the multiply has produced a result *without* the sign-extension, so
the test fails.
It would have been handy to see the opcodes emitted before optimization, so
that we can verify what the riscv front-end did. However, it's not that hard
to regenerate.
Annoyingly, the tcg optimizer hasn't been taught to fold things the same way
when the backend supports mulu2, so we *don't* fold the test away for an x86_64
host.
mulu2_i64 tmp4,tmp5,t0 ,t1 dead: 0 2 3
movi_i64 tmp4,$0xffffffffffff8000
sub_i64 tmp2,tmp5,tmp4 dead: 1 2
mov_i64 s0 ,tmp2 sync: 0 dead: 0 1
And thus no doubt the test succeeds for that case.
So lemme try again with an aarch64 host.
movi_i64 tmp2,$0x80000000
mov_i64 s0 ,tmp2 sync: 0 dead: 0 1
Yay, reproduction. The input to the optimization is
mov_i64 tmp2,t0
mov_i64 tmp3,t1
mul_i64 tmp6,tmp2,tmp3
muluh_i64 tmp5,tmp2,tmp3
mov_i64 tmp4,tmp6
movi_i64 tmp6,$0x3f
sar_i64 tmp4,tmp2,tmp6
and_i64 tmp4,tmp4,tmp3
sub_i64 tmp2,tmp5,tmp4
mov_i64 s0 ,tmp2
Now, dead-code and forward propagate constants by hand,
movi_i64 tmp2,$0xffffffff80000000
movi_i64 tmp3,$0xffffffffffff8000
muluh_i64 tmp5,tmp2,tmp3
sub_i64 tmp2,tmp5,$0xffffffffffff8000
mov_i64 s0 ,tmp2
Now, for the unsigned multiplication we get
18446744071562067968 * 18446744073709518848
= 340282366881323777743332701689153060864
= FFFF_FFFF_7FFF_8000 0000_4000_0000_0000
....
Oops, I see the bug right away now.
Value returned is $3 = 18446744071562035200
(gdb) n
410 if (!(def->flags & TCG_OPF_64BIT)) {
411 res = (int32_t)res;
Failure to mark muluh_i64 and mulsh_i64 as 64-bit ops, so we've discarded the
high 32 bits of the result.
I'm surprised that we haven't seen this as a problem before. Perhaps luck in
non-optimization; we need a test case this small (but no smaller, like RISU) to
be able to produce a bad result.
So. Two line patch to follow.
r~