qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 13/22] target/loongarch: Add floating point arithmetic ins


From: Richard Henderson
Subject: Re: [PATCH v2 13/22] target/loongarch: Add floating point arithmetic instruction translation
Date: Thu, 22 Jul 2021 19:44:51 -1000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

On 7/20/21 11:53 PM, Song Gao wrote:
+uint64_t helper_fp_sqrt_d(CPULoongArchState *env, uint64_t fp)
+{
+    fp = float64_sqrt(fp, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp;
+}
+
+uint32_t helper_fp_sqrt_s(CPULoongArchState *env, uint32_t fp)
+{
+    fp = float32_sqrt(fp, &env->active_fpu.fp_status);
+    update_fcsr0(env, GETPC());
+    return fp;
+}

I believe you will find it easier to take and return uint64_t, even for 32-bit operations. The manual says that the high bits may contain any value, so in my opinion you should not work hard to preserve the high bits, as you currently do with

+    gen_load_fpr32(fp0, a->fj);
+    gen_load_fpr32(fp1, a->fk);
+    gen_helper_fp_add_s(fp0, cpu_env, fp0, fp1);
+    gen_store_fpr32(fp0, a->fd);

I think this should be as simple as

  gen_helper_fp_add_s(cpu_fpu[a->fd], cpu_env,
                      cpu_fpu[a->fj], cpu_fpu[a->fk]);

I also think that loongarch should learn from risc-v and change the architecture to "nan-box" single-precision results -- fill the high 32-bits with 1s. This is an SNaN representation for double-precision and will immediately fail when incorrectly using a single-precision value as a double-precision input.

Thankfully the current architecture is backward compatible with nan-boxing.

+/* Floating point arithmetic operation instruction translation */
+static bool trans_fadd_s(DisasContext *ctx, arg_fadd_s * a)
+{
+    TCGv_i32 fp0, fp1;
+
+    fp0 = tcg_temp_new_i32();
+    fp1 = tcg_temp_new_i32();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr32(fp0, a->fj);
+    gen_load_fpr32(fp1, a->fk);
+    gen_helper_fp_add_s(fp0, cpu_env, fp0, fp1);
+    gen_store_fpr32(fp0, a->fd);
+
+    tcg_temp_free_i32(fp0);
+    tcg_temp_free_i32(fp1);
+
+    return true;
+}

Again, you should use some helper functions to reduce the repetition.

+static bool trans_fmadd_d(DisasContext *ctx, arg_fmadd_d *a)
+{
+    TCGv_i64 fp0, fp1, fp2, fp3;
+
+    fp0 = tcg_temp_new_i64();
+    fp1 = tcg_temp_new_i64();
+    fp2 = tcg_temp_new_i64();
+    fp3 = tcg_temp_new_i64();
+
+    check_fpu_enabled(ctx);
+    gen_load_fpr64(fp0, a->fj);
+    gen_load_fpr64(fp1, a->fk);
+    gen_load_fpr64(fp2, a->fa);
+    check_fpu_enabled(ctx);

Repeating check_fpu_enabled.

+    gen_helper_fp_madd_d(fp3, cpu_env, fp0, fp1, fp2);
+    gen_store_fpr64(fp3, a->fd);

I think you might as well pass in the float_muladd_* constant to a single helper rather than having 4 different helpers.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]