[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2] target/mips/translate: Simplify PCPYH using deposit_i64()
From: |
Richard Henderson |
Subject: |
Re: [PATCH v2] target/mips/translate: Simplify PCPYH using deposit_i64() |
Date: |
Fri, 12 Feb 2021 16:56:20 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 |
On 2/12/21 4:26 PM, Philippe Mathieu-Daudé wrote:
> Simplify the PCPYH (Parallel Copy Halfword) instruction by using
> multiple calls to deposit_i64() which can be optimized by some
> TCG backends.
>
> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
> v2: Send the Halfword version :)
> ---
> target/mips/translate.c | 36 ++++++------------------------------
> 1 file changed, 6 insertions(+), 30 deletions(-)
>
> diff --git a/target/mips/translate.c b/target/mips/translate.c
> index a5cf1742a8b..ddae26009dd 100644
> --- a/target/mips/translate.c
> +++ b/target/mips/translate.c
> @@ -24786,36 +24786,12 @@ static void gen_mmi_pcpyh(DisasContext *ctx)
> tcg_gen_movi_i64(cpu_gpr[rd], 0);
> tcg_gen_movi_i64(cpu_mmr[rd], 0);
> } else {
> - TCGv_i64 t0 = tcg_temp_new();
> - TCGv_i64 t1 = tcg_temp_new();
> - uint64_t mask = (1ULL << 16) - 1;
> -
> - tcg_gen_andi_i64(t0, cpu_gpr[rt], mask);
> - tcg_gen_movi_i64(t1, 0);
> - tcg_gen_or_i64(t1, t0, t1);
> - tcg_gen_shli_i64(t0, t0, 16);
> - tcg_gen_or_i64(t1, t0, t1);
> - tcg_gen_shli_i64(t0, t0, 16);
> - tcg_gen_or_i64(t1, t0, t1);
> - tcg_gen_shli_i64(t0, t0, 16);
> - tcg_gen_or_i64(t1, t0, t1);
> -
> - tcg_gen_mov_i64(cpu_gpr[rd], t1);
> -
> - tcg_gen_andi_i64(t0, cpu_mmr[rt], mask);
> - tcg_gen_movi_i64(t1, 0);
> - tcg_gen_or_i64(t1, t0, t1);
> - tcg_gen_shli_i64(t0, t0, 16);
> - tcg_gen_or_i64(t1, t0, t1);
> - tcg_gen_shli_i64(t0, t0, 16);
> - tcg_gen_or_i64(t1, t0, t1);
> - tcg_gen_shli_i64(t0, t0, 16);
> - tcg_gen_or_i64(t1, t0, t1);
> -
> - tcg_gen_mov_i64(cpu_mmr[rd], t1);
> -
> - tcg_temp_free(t0);
> - tcg_temp_free(t1);
> + for (int i = 0; i < 4; i++) {
> + tcg_gen_deposit_i64(cpu_gpr[rd],
> + cpu_gpr[rd], cpu_gpr[rd], 16 * i, 16);
> + tcg_gen_deposit_i64(cpu_mmr[rd],
> + cpu_mmr[rd], cpu_mmr[rd], 16 * i, 16);
Missing rt in the replacement.
To make 4 identical copies, make use of previous inserts:
tcg_gen_deposit_i64(rd, rt, rt, 16, 48);
tcg_gen_deposit_i64(rd, rd, rd, 32, 32);
r~