[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH v2 46/69] target/arm: Handle FPCR.AH in vector FCMLA
From: |
Peter Maydell |
Subject: |
[PATCH v2 46/69] target/arm: Handle FPCR.AH in vector FCMLA |
Date: |
Sat, 1 Feb 2025 16:39:49 +0000 |
From: Richard Henderson <richard.henderson@linaro.org>
The negation step in FCMLA mustn't negate a NaN when FPCR.AH
is set. Handle this by passing FPCR.AH to the helper via the
SIMD data field, and use this to select whether to do the
negation via XOR or via the muladd negate_product flag.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250129013857.135256-26-richard.henderson@linaro.org
[PMM: Expanded commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/tcg/translate-a64.c | 2 +-
target/arm/tcg/vec_helper.c | 66 ++++++++++++++++++++--------------
2 files changed, 40 insertions(+), 28 deletions(-)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index c209ac84228..c45a9822281 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6175,7 +6175,7 @@ static bool trans_FCMLA_v(DisasContext *s, arg_FCMLA_v *a)
gen_gvec_op4_fpst(s, a->q, a->rd, a->rn, a->rm, a->rd,
a->esz == MO_16 ? FPST_A64_F16 : FPST_A64,
- a->rot, fn[a->esz]);
+ a->rot | (s->fpcr_ah << 2), fn[a->esz]);
return true;
}
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index fc3e6587b81..630513f00b2 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -965,22 +965,26 @@ void HELPER(gvec_fcmlah)(void *vd, void *vn, void *vm,
void *va,
uintptr_t opr_sz = simd_oprsz(desc);
float16 *d = vd, *n = vn, *m = vm, *a = va;
intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
- uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
- uint32_t neg_real = flip ^ neg_imag;
+ uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+ uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+ uint32_t negf_real = flip ^ negf_imag;
+ float16 negx_imag, negx_real;
uintptr_t i;
- /* Shift boolean to the sign bit so we can xor to negate. */
- neg_real <<= 15;
- neg_imag <<= 15;
+ /* With AH=0, use negx; with AH=1 use negf. */
+ negx_real = (negf_real & ~fpcr_ah) << 15;
+ negx_imag = (negf_imag & ~fpcr_ah) << 15;
+ negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+ negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
for (i = 0; i < opr_sz / 2; i += 2) {
float16 e2 = n[H2(i + flip)];
- float16 e1 = m[H2(i + flip)] ^ neg_real;
+ float16 e1 = m[H2(i + flip)] ^ negx_real;
float16 e4 = e2;
- float16 e3 = m[H2(i + 1 - flip)] ^ neg_imag;
+ float16 e3 = m[H2(i + 1 - flip)] ^ negx_imag;
- d[H2(i)] = float16_muladd(e2, e1, a[H2(i)], 0, fpst);
- d[H2(i + 1)] = float16_muladd(e4, e3, a[H2(i + 1)], 0, fpst);
+ d[H2(i)] = float16_muladd(e2, e1, a[H2(i)], negf_real, fpst);
+ d[H2(i + 1)] = float16_muladd(e4, e3, a[H2(i + 1)], negf_imag, fpst);
}
clear_tail(d, opr_sz, simd_maxsz(desc));
}
@@ -1025,22 +1029,26 @@ void HELPER(gvec_fcmlas)(void *vd, void *vn, void *vm,
void *va,
uintptr_t opr_sz = simd_oprsz(desc);
float32 *d = vd, *n = vn, *m = vm, *a = va;
intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
- uint32_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
- uint32_t neg_real = flip ^ neg_imag;
+ uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+ uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+ uint32_t negf_real = flip ^ negf_imag;
+ float32 negx_imag, negx_real;
uintptr_t i;
- /* Shift boolean to the sign bit so we can xor to negate. */
- neg_real <<= 31;
- neg_imag <<= 31;
+ /* With AH=0, use negx; with AH=1 use negf. */
+ negx_real = (negf_real & ~fpcr_ah) << 31;
+ negx_imag = (negf_imag & ~fpcr_ah) << 31;
+ negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+ negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
for (i = 0; i < opr_sz / 4; i += 2) {
float32 e2 = n[H4(i + flip)];
- float32 e1 = m[H4(i + flip)] ^ neg_real;
+ float32 e1 = m[H4(i + flip)] ^ negx_real;
float32 e4 = e2;
- float32 e3 = m[H4(i + 1 - flip)] ^ neg_imag;
+ float32 e3 = m[H4(i + 1 - flip)] ^ negx_imag;
- d[H4(i)] = float32_muladd(e2, e1, a[H4(i)], 0, fpst);
- d[H4(i + 1)] = float32_muladd(e4, e3, a[H4(i + 1)], 0, fpst);
+ d[H4(i)] = float32_muladd(e2, e1, a[H4(i)], negf_real, fpst);
+ d[H4(i + 1)] = float32_muladd(e4, e3, a[H4(i + 1)], negf_imag, fpst);
}
clear_tail(d, opr_sz, simd_maxsz(desc));
}
@@ -1085,22 +1093,26 @@ void HELPER(gvec_fcmlad)(void *vd, void *vn, void *vm,
void *va,
uintptr_t opr_sz = simd_oprsz(desc);
float64 *d = vd, *n = vn, *m = vm, *a = va;
intptr_t flip = extract32(desc, SIMD_DATA_SHIFT, 1);
- uint64_t neg_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
- uint64_t neg_real = flip ^ neg_imag;
+ uint32_t fpcr_ah = extract32(desc, SIMD_DATA_SHIFT + 2, 1);
+ uint32_t negf_imag = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+ uint32_t negf_real = flip ^ negf_imag;
+ float64 negx_real, negx_imag;
uintptr_t i;
- /* Shift boolean to the sign bit so we can xor to negate. */
- neg_real <<= 63;
- neg_imag <<= 63;
+ /* With AH=0, use negx; with AH=1 use negf. */
+ negx_real = (uint64_t)(negf_real & ~fpcr_ah) << 63;
+ negx_imag = (uint64_t)(negf_imag & ~fpcr_ah) << 63;
+ negf_real = (negf_real & fpcr_ah ? float_muladd_negate_product : 0);
+ negf_imag = (negf_imag & fpcr_ah ? float_muladd_negate_product : 0);
for (i = 0; i < opr_sz / 8; i += 2) {
float64 e2 = n[i + flip];
- float64 e1 = m[i + flip] ^ neg_real;
+ float64 e1 = m[i + flip] ^ negx_real;
float64 e4 = e2;
- float64 e3 = m[i + 1 - flip] ^ neg_imag;
+ float64 e3 = m[i + 1 - flip] ^ negx_imag;
- d[i] = float64_muladd(e2, e1, a[i], 0, fpst);
- d[i + 1] = float64_muladd(e4, e3, a[i + 1], 0, fpst);
+ d[i] = float64_muladd(e2, e1, a[i], negf_real, fpst);
+ d[i + 1] = float64_muladd(e4, e3, a[i + 1], negf_imag, fpst);
}
clear_tail(d, opr_sz, simd_maxsz(desc));
}
--
2.34.1
- [PATCH v2 39/69] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns, (continued)
- [PATCH v2 39/69] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns, Peter Maydell, 2025/02/01
- [PATCH v2 40/69] target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns, Peter Maydell, 2025/02/01
- [PATCH v2 38/69] target/arm: Handle FPCR.AH in negation steps in FCADD, Peter Maydell, 2025/02/01
- [PATCH v2 41/69] target/arm: Handle FPCR.AH in negation step in FMLS (indexed), Peter Maydell, 2025/02/01
- [PATCH v2 42/69] target/arm: Handle FPCR.AH in negation in FMLS (vector), Peter Maydell, 2025/02/01
- [PATCH v2 43/69] target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector), Peter Maydell, 2025/02/01
- [PATCH v2 44/69] target/arm: Handle FPCR.AH in SVE FTSSEL, Peter Maydell, 2025/02/01
- [PATCH v2 47/69] target/arm: Handle FPCR.AH in FCMLA by index, Peter Maydell, 2025/02/01
- [PATCH v2 48/69] target/arm: Handle FPCR.AH in SVE FCMLA, Peter Maydell, 2025/02/01
- [PATCH v2 45/69] target/arm: Handle FPCR.AH in SVE FTMAD, Peter Maydell, 2025/02/01
- [PATCH v2 46/69] target/arm: Handle FPCR.AH in vector FCMLA,
Peter Maydell <=
- [PATCH v2 49/69] target/arm: Handle FPCR.AH in FMLSL (by element and vector), Peter Maydell, 2025/02/01
- [PATCH v2 52/69] target/arm: Enable FEAT_AFP for '-cpu max', Peter Maydell, 2025/02/01
- [PATCH v2 50/69] target/arm: Handle FPCR.AH in SVE FMLSL (indexed), Peter Maydell, 2025/02/01
- [PATCH v2 54/69] target/arm: Implement increased precision FRECPE, Peter Maydell, 2025/02/01
- [PATCH v2 53/69] target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper, Peter Maydell, 2025/02/01
- [PATCH v2 51/69] target/arm: Handle FPCR.AH in SVE FMLSLB, FMLSLT (vectors), Peter Maydell, 2025/02/01
- [PATCH v2 59/69] target/arm: Remove standard_fp_status, Peter Maydell, 2025/02/01
- [PATCH v2 60/69] target/arm: Remove ah_fp_status_f16, Peter Maydell, 2025/02/01
- [PATCH v2 61/69] target/arm: Remove ah_fp_status, Peter Maydell, 2025/02/01
- [PATCH v2 62/69] target/arm: Remove fp_status_f16_a64, Peter Maydell, 2025/02/01