[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH 19/42] target/mips/tx79: Introduce PCEQ* opcodes (Paralle
From: |
Richard Henderson |
Subject: |
Re: [RFC PATCH 19/42] target/mips/tx79: Introduce PCEQ* opcodes (Parallel Compare for Equal) |
Date: |
Mon, 15 Feb 2021 12:32:04 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 |
On 2/14/21 9:58 AM, Philippe Mathieu-Daudé wrote:
> +static bool trans_parallel_compare(DisasContext *ctx, arg_rtype *a,
> + TCGCond cond, unsigned wlen)
> +{
> + TCGv_i64 c0, c1, ax, bx, t0, t1, t2;
> +
> + if (a->rd == 0) {
> + /* nop */
> + return true;
> + }
> +
> + c0 = tcg_const_tl(0);
> + c1 = tcg_const_tl(0xffffffff);
Cheaper for most hosts to load -1 than a 32-bit value zero-extended to 64 bits.
That said, you could also use
setcond(t0, t0, t1, cond);
neg(t0, t0);
> + for (int i = 0; i < (64 / wlen); i++) {
> + tcg_gen_sextract_i64(t0, ax, wlen * i, wlen);
> + tcg_gen_sextract_i64(t1, bx, wlen * i, wlen);
> + tcg_gen_movcond_i64(cond, t2, t1, t0, c1, c0);
> + tcg_gen_deposit_i64(cpu_gpr[a->rd], cpu_gpr[a->rd], t2, wlen * i,
> wlen);
> + }
For an accumulate loop like this, we'll get better results if the length of the
insert is the remaining length of the register. That way, the first insert is
width 64, which turns into a move, so that the old value of rd is not used.
Further, we can use extract2 to replace the remaining length when deposit is
not available.
Also, while you will need this compare loop for GT, there's a cheaper way to
compute EQ, which we use in several places in QEMU.
void gen_pceq(TCGv_i64 d, TCGv_i64 s, TCGv_i64 t, MemOp esz)
{
TCGv_i64 one = tcg_constant_i64(dup_const(esz, 1));
TCGv_i64 x = tcg_temp_new_i64();
/* Turn s == t into x == 0. */
tcg_gen_xor_i64(x, s, t);
/*
* See hasless(v,1) from
* https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord
* Shift the msb down, then use muli to replicate
* the one bit across the vector element.
*/
tcg_gen_sub_i64(d, x, one);
tcg_gen_andc_i64(d, d, x);
tcg_gen_shri_i64(d, d, (8 << esz) - 1);
tcg_gen_and_i64(d, d, one);
tcg_gen_muli_i64(d, d, MAKE_64BIT_MASK(0, 8 << esz));
tcg_temp_free_i64(x);
}
In both cases, I think you should pull out helper functions and then use
trans_parallel_logic.
r~
- Re: [RFC PATCH 13/42] target/mips: Remove 'C790 Multimedia Instructions' dead code, (continued)
- [RFC PATCH 17/42] target/mips/tx79: Introduce PEXTUW (Parallel Extend Upper from Word), Philippe Mathieu-Daudé, 2021/02/14
- [RFC PATCH 15/42] target/mips/tx79: Introduce PAND/POR/PXOR/PNOR opcodes (parallel logic), Philippe Mathieu-Daudé, 2021/02/14
- [RFC PATCH 14/42] target/mips/tx79: Salvage instructions description comment, Philippe Mathieu-Daudé, 2021/02/14
- [RFC PATCH 16/42] target/mips/tx79: Introduce PSUB* opcodes (Parallel Subtract), Philippe Mathieu-Daudé, 2021/02/14
- [RFC PATCH 19/42] target/mips/tx79: Introduce PCEQ* opcodes (Parallel Compare for Equal), Philippe Mathieu-Daudé, 2021/02/14
- Re: [RFC PATCH 19/42] target/mips/tx79: Introduce PCEQ* opcodes (Parallel Compare for Equal),
Richard Henderson <=
- [RFC PATCH 18/42] target/mips/tx79: Introduce PEXTU[BHW] opcodes (Parallel Extend Lower), Philippe Mathieu-Daudé, 2021/02/14
- [RFC PATCH 20/42] target/mips/tx79: Introduce PCGT* (Parallel Compare for Greater Than), Philippe Mathieu-Daudé, 2021/02/14
- [RFC PATCH 21/42] target/mips/tx79: Introduce PPACW opcode (Parallel Pack to Word), Philippe Mathieu-Daudé, 2021/02/14
- [RFC PATCH 22/42] target/mips/tx79: Introduce PINTEH (Parallel Interleave Even Halfword), Philippe Mathieu-Daudé, 2021/02/14
- [RFC PATCH 23/42] target/mips/tx79: Introduce PEXE[HW] opcodes (Parallel Exchange Even), Philippe Mathieu-Daudé, 2021/02/14
- [RFC PATCH 24/42] target/mips/tx79: Introduce PROT3W opcode (Parallel Rotate 3 Words), Philippe Mathieu-Daudé, 2021/02/14