|
| From: | Richard Henderson |
| Subject: | Re: [PATCH v3 04/37] target/ppc: vmulh* instructions use gvec |
| Date: | Fri, 11 Feb 2022 14:51:34 +1100 |
| User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 |
On 2/10/22 23:34, matheus.ferst@eldorado.org.br wrote:
+static void do_vx_vmulhu_vec(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
+{
+ TCGv_vec a1, b1, mask, w, k;
+ unsigned bits;
+ bits = (vece == MO_32) ? 16 : 32;
+
+ a1 = tcg_temp_new_vec_matching(t);
+ b1 = tcg_temp_new_vec_matching(t);
+ w = tcg_temp_new_vec_matching(t);
+ k = tcg_temp_new_vec_matching(t);
+ mask = tcg_temp_new_vec_matching(t);
+
+ tcg_gen_dupi_vec(vece, mask, (vece == MO_32) ? 0xFFFF : 0xFFFFFFFF);
+ tcg_gen_and_vec(vece, a1, a, mask);
+ tcg_gen_and_vec(vece, b1, b, mask);
+ tcg_gen_mul_vec(vece, t, a1, b1);
+ tcg_gen_shri_vec(vece, k, t, bits);
+
+ tcg_gen_shri_vec(vece, a1, a, bits);
+ tcg_gen_mul_vec(vece, t, a1, b1);
+ tcg_gen_add_vec(vece, t, t, k);
+ tcg_gen_and_vec(vece, k, t, mask);
+ tcg_gen_shri_vec(vece, w, t, bits);
+
+ tcg_gen_and_vec(vece, a1, a, mask);
+ tcg_gen_shri_vec(vece, b1, b, bits);
+ tcg_gen_mul_vec(vece, t, a1, b1);
+ tcg_gen_add_vec(vece, t, t, k);
+ tcg_gen_shri_vec(vece, k, t, bits);
+
+ tcg_gen_shri_vec(vece, a1, a, bits);
+ tcg_gen_mul_vec(vece, t, a1, b1);
+ tcg_gen_add_vec(vece, t, t, w);
+ tcg_gen_add_vec(vece, t, t, k);
I don't think that you should decompose 4 high-part 32-bit multiplies into 4 32-bit multiplies plus lots of arithmetic. This is not a win. You're actually better off with pure integer arithmetic here.
You could instead widen these into 2 64-bit multiplies, plus some arithmetic. That's certainly closer to the break-even point.
+ {
+ .fniv = do_vx_vmulhu_vec,
+ .fno = gen_helper_VMULHUD,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
As for the two high-part 64-bit multiplies, I think that should definitely remain an integer operation.
You probably want to expand these with inline integer operations using .fni[48].
+static void do_vx_vmulhs_vec(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
Very much likewise. r~
| [Prev in Thread] | Current Thread | [Next in Thread] |