[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Possible ppc comparision optimisation
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] Possible ppc comparision optimisation |
Date: |
Wed, 08 May 2013 10:05:02 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130311 Thunderbird/17.0.4 |
Il 08/05/2013 00:56, Torbjorn Granlund ha scritto:
> The current ppc gen_op_cmp generates a long sequence of instructions,
> using a plain series of three disjoint compares.
>
> It is possible to compute the 3 result bits more cleverly. Below is a
> possible replacement gen_op_cmp. (It is tested by booting GNU/Linux
> ppx64, but not much more than that.)
>
> Surely this should be faster than the old code? OK, it is less
> readable, but cmp is pretty critical and should be made fast.
>
> Should one truncate things using tcg_gen_trunc_tl_i32 and do the add,
> xori, addi as i32 variants? (Why?)
I think that would be faster on 32-bit hosts, truncs are cheap.
> There could be a disadvantage of this compared to the old code, since
> this has a chained algebraic dependency, while the old code's many
> instructions might have been more independent.
What about these alternatives:
setcond LT, t0, arg0, arg1
setcond EQ, t1, arg0, arg1
trunc s0, t0
trunc s1, t1
shli s0, s0, 1 ; s0 = (arg0 < arg1) ? 2 : 0
subi s1, s1, 2 ; s1 = (arg0 != arg1) ? -2 : -1
sub s0, s0, s1 ; < 4 == 1 > 2
shli s0, s0, 1 ; < 8 == 2 > 4
=======
setcond LT, t0, arg0, arg1
setcond NE, t1, arg0, arg1
trunc s0, t0
trunc s1, t1
add s0, s0, s1 ; < 2 == 0 > 1
movi s1, 1
add s0, s0, s1 ; < 3 == 1 > 2
shl s1, s1, s0 ; < 8 == 2 > 4
Paolo
> static inline void gen_op_cmp(TCGv arg0, TCGv arg1, int s, int crf)
> {
> TCGv t0 = tcg_temp_new();
> TCGv t1 = tcg_temp_new();
> TCGv_i32 s0 = tcg_temp_new_i32();
>
> tcg_gen_trunc_tl_i32(cpu_crf[crf], cpu_so);
>
> tcg_gen_setcond_tl((s ? TCG_COND_LE: TCG_COND_LEU), t0, arg0, arg1);
> tcg_gen_setcond_tl((s ? TCG_COND_LT: TCG_COND_LTU), t1, arg0, arg1);
> tcg_gen_add_tl(t0, t0, t1);
> tcg_gen_xori_tl(t0, t0, 1);
> tcg_gen_addi_tl(t0, t0, 1);
> tcg_gen_trunc_tl_i32(s0, t0);
> tcg_gen_shli_i32(s0, s0, 1);
> tcg_gen_or_i32(cpu_crf[crf], cpu_crf[crf], s0);
>
> tcg_temp_free(t0);
> tcg_temp_free(t1);
> tcg_temp_free_i32(s0);
> }
>