[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH for-9.1 09/19] target/i386: move 60-BF opcodes to new decoder
|
From: |
Richard Henderson |
|
Subject: |
Re: [PATCH for-9.1 09/19] target/i386: move 60-BF opcodes to new decoder |
|
Date: |
Wed, 10 Apr 2024 18:12:35 -1000 |
|
User-agent: |
Mozilla Thunderbird |
On 4/9/24 06:43, Paolo Bonzini wrote:
+static void gen_ARPL(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
+{
+ TCGLabel *label1 = gen_new_label();
+ TCGv rpl_adj = tcg_temp_new();
+ TCGv flags = tcg_temp_new();
+
+ gen_mov_eflags(s, flags);
+ tcg_gen_andi_tl(flags, flags, ~CC_Z);
+
+ /* Compute dest[rpl] - src[rpl], adjust if result <0. */
+ tcg_gen_andi_tl(rpl_adj, s->T0, 3);
+ tcg_gen_andi_tl(s->T1, s->T1, 3);
+ tcg_gen_sub_tl(rpl_adj, rpl_adj, s->T1);
+
+ tcg_gen_brcondi_tl(TCG_COND_LT, rpl_adj, 0, label1);
Comment is right, but branch condition is wrong.
I think this might be better as:
/* SRC = DST with SRC[RPL] */
tcg_gen_deposit_tl(s->T1, s->T0, s->T1, 0, 2);
/* Z flag set if DST < SRC */
tcg_gen_setcond_tl(TCG_COND_LTU, tmp, s->T0, s->T1);
/* Install Z */
tcg_gen_deposit_tl(flags, flags, tmp, ctz(CC_Z), 1);
/* DST with maximum RPL */
tcg_gen_umax_tl(s->T0, s->T0, s->T1);
+ case MO_32:
+#ifdef TARGET_X86_64
+ /*
+ * This could also use the same algorithm as MO_16. It produces fewer
+ * TCG ops and better code if flags are needed, but it requires a
64-bit
+ * multiply even if they are not (and thus the high part of the
multiply
+ * is dead).
+ */
Is 64-bit multiply ever slower these days?
My intuition says "slow" multiply is at least a decade out of date.
+ tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
+ tcg_gen_trunc_tl_i32(s->tmp3_i32, s->T1);
Avoid s->tmp*, especially in new code.
+ tcg_gen_muls2_i32(s->tmp2_i32, s->tmp3_i32,
+ s->tmp2_i32, s->tmp3_i32);
+ tcg_gen_extu_i32_tl(s->T0, s->tmp2_i32);
+
+ cc_src_rhs = tcg_temp_new();
+ tcg_gen_extu_i32_tl(cc_src_rhs, s->tmp3_i32);
+ /* Compare the high part to the sign bit of the truncated result */
+ tcg_gen_negsetcondi_i32(TCG_COND_LT, s->tmp2_i32, s->tmp2_i32, 0);
This seems like something the optimizer should handle, but doesn't.
I'd write this as
tcg_gen_sari_i32(tmp, tmp, 31);
or
tcg_gen_sextract_i32(tmp, tmp, 31, 1);
which I know will expand to the same thing.
+ case MO_64:
+#endif
+ cc_src_rhs = tcg_temp_new();
+ tcg_gen_muls2_tl(s->T0, cc_src_rhs, s->T0, s->T1);
+ /* Compare the high part to the sign bit of the truncated result */
+ tcg_gen_negsetcondi_tl(TCG_COND_LT, s->T1, s->T0, 0);
Similarly.
r~
- [PATCH for-9.1 07/19] target/i386: extract gen_far_call/jmp, reordering temporaries, (continued)
- [PATCH for-9.1 07/19] target/i386: extract gen_far_call/jmp, reordering temporaries, Paolo Bonzini, 2024/04/09
- [PATCH for-9.1 10/19] target/i386: generalize gen_movl_seg_T0, Paolo Bonzini, 2024/04/09
- [PATCH for-9.1 08/19] target/i386: allow instructions with more than one immediate, Paolo Bonzini, 2024/04/09
- [PATCH for-9.1 13/19] target/i386: move remaining conditional operations to new decoder, Paolo Bonzini, 2024/04/09
- [PATCH for-9.1 09/19] target/i386: move 60-BF opcodes to new decoder, Paolo Bonzini, 2024/04/09
- [PATCH for-9.1 15/19] target/i386: port extensions of one-byte opcodes to new decoder, Paolo Bonzini, 2024/04/09
- [PATCH for-9.1 11/19] target/i386: move C0-FF opcodes to new decoder (except for x87), Paolo Bonzini, 2024/04/09