[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v7 1/6] target/mips: Optimize ILVOD.<B|H|W|D> MS
From: |
Richard Henderson |
Subject: |
Re: [Qemu-devel] [PATCH v7 1/6] target/mips: Optimize ILVOD.<B|H|W|D> MSA instructions |
Date: |
Wed, 17 Apr 2019 09:29:00 -1000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 |
On 4/17/19 5:33 AM, Mateja Marjanovic wrote:
> From: Mateja Marjanovic <address@hidden>
>
> Optimize set of MSA instructions ILVOD.<B|H|W|D>, using
> directly tcg registers and performing logic on them instead
> of using helpers.
>
> In the following table, the first column is the performance
> before this patch. The second represents the performance
> after converting from helpers to tcg, but without using
> tcg_gen_deposit function. The third one is with the deposit
> function and with using a uint64_t constant bit mask, and
> the fourth is with the deposit function and with a mask
> which is a tcg constant. The fourth is implemented in this
> patch.
>
> Performance measurement is done by executing the
> instructions 10 million times on a computer
> with Intel Core i7-3770 CPU @ 3.40GHz×8.
>
> ==================================================================
> || instruction || 1 || 2 || 3 || 4 ||
> ==================================================================
> || ilvod.b || 117.50 ms || 24.13 ms || 24.45 ms || 23.24 ms ||
> || ilvod.h || 93.16 ms || 24.21 ms || 24.28 ms || 23.20 ms ||
> || ilvod.w || 119.90 ms || 24.15 ms || 23.19 ms || 22.95 ms ||
> || ilvod.d || 43.01 ms || 21.17 ms || 23.07 ms || 22.59 ms ||
> ==================================================================
> 1 - before
> 2 - no-deposit-no-mask-as-tcg-constant
> 3 - with-deposit-no-mask-as-tcg-constant
> 4 - with-deposit-with-mask-as-tcg-constant (final)
>
> The deposit function is used only in ILVOD.W.
>
> No-deposit version of the ILVOD.W implementation:
>
> static inline void gen_ilvod_w(CPUMIPSState *env, uint32_t wd,
> uint32_t ws, uint32_t wt)
> {
> TCGv_i64 t1 = tcg_temp_new_i64();
> TCGv_i64 t2 = tcg_temp_new_i64();
> TCGv_i64 mask = tcg_const_i64(0xffffffff00000000ULL);
>
> tcg_gen_and_i64(t1, msa_wr_d[wt * 2], mask);
> tcg_gen_shri_i64(t1, t1, 32);
> tcg_gen_and_i64(t2, msa_wr_d[ws * 2], mask);
> tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);
>
> tcg_gen_and_i64(t1, msa_wr_d[wt * 2 + 1], mask);
> tcg_gen_shri_i64(t1, t1, 32);
> tcg_gen_and_i64(t2, msa_wr_d[ws * 2 + 1], mask);
> tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2);
>
> tcg_temp_free_i64(mask);
> tcg_temp_free_i64(t1);
> tcg_temp_free_i64(t2);
> }
>
> Suggested-by: Aleksandar Markovic <address@hidden>
> Suggested-by: Philippe Mathieu-Daudé <address@hidden>
> Suggested-by: Richard Henderson <address@hidden>
> Signed-off-by: Mateja Marjanovic <address@hidden>
> ---
> target/mips/helper.h | 1 -
> target/mips/msa_helper.c | 7 ----
> target/mips/translate.c | 91
> +++++++++++++++++++++++++++++++++++++++++++++++-
> 3 files changed, 90 insertions(+), 9 deletions(-)
Reviewed-by: Richard Henderson <address@hidden>
r~
- [Qemu-devel] [PATCH v7 0/6] target/mips: Optimize MSA interleave instructions, Mateja Marjanovic, 2019/04/17
- [Qemu-devel] [PATCH v7 4/6] target/mips: Optimize ILVR.<B|H|W|D> MSA instructions, Mateja Marjanovic, 2019/04/17
- [Qemu-devel] [PATCH v7 1/6] target/mips: Optimize ILVOD.<B|H|W|D> MSA instructions, Mateja Marjanovic, 2019/04/17
- Re: [Qemu-devel] [PATCH v7 1/6] target/mips: Optimize ILVOD.<B|H|W|D> MSA instructions,
Richard Henderson <=
- [Qemu-devel] [PATCH v7 3/6] target/mips: Optimize ILVL.<B|H|W|D> MSA instructions, Mateja Marjanovic, 2019/04/17
- [Qemu-devel] [PATCH v7 6/6] target/mips: Merge implementation of ILVOD.D and ILVL.D, Mateja Marjanovic, 2019/04/17
- [Qemu-devel] [PATCH v7 4/6] target/mips: Optimize ILVR.<B|H|W|D> MSA instructions, Mateja Marjanovic, 2019/04/17
- [Qemu-devel] [PATCH v7 2/6] target/mips: Optimize ILVEV.<B|H|W|D> MSA instructions, Mateja Marjanovic, 2019/04/17
- Re: [Qemu-devel] [PATCH v7 0/6] target/mips: Optimize MSA interleave instructions, Aleksandar Markovic, 2019/04/17