[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v1 08/14] hostfloat: support float32/64 addition
From: |
Richard Henderson |
Subject: |
Re: [Qemu-devel] [PATCH v1 08/14] hostfloat: support float32/64 addition and subtraction |
Date: |
Thu, 22 Mar 2018 14:41:05 +0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 |
On 03/22/2018 01:57 PM, Emilio G. Cota wrote:
>> Is there any especially good reason you want to not put this code into the
>> normal softfloat function? Does it really many any measurable difference at
>> all to force this code to be inlined into a helper?
>
> You mean to do this? (... or see below)
>
> --- a/fpu/hostfloat.c
> +++ b/fpu/hostfloat.c
> @@ -97,7 +97,7 @@ GEN_INPUT_FLUSH(float64)
>
> #define GEN_FPU_ADDSUB(add_name, sub_name, soft_t, host_t, \
> host_abs_func, min_normal) \
> - static inline __attribute__((always_inline)) soft_t \
> + static soft_t \
> fpu_ ## soft_t ## _addsub(soft_t a, soft_t b, bool subtract, \
> float_status *s) \
> { \
>
> That slows add/sub dramatically, because addsub is not inlined into
> float32_add and float32_sub (that's an extra function call plus an
> extra branch per emulated op).
>
> For x86_64-linux-user/qemu-x86_64 tests/fp-bench -o add, the above gives
> - before: 188.06 MFlops
> - after: 117.56 MFlops
Well, not quite. I meant putting all of the new code into softfloat.c, and not
attempting to inline any of it in target/cpu/foo_helper.c.
The best written target helpers currently simply tail-call to softfloat.c.
There are others that do so after minor argument adjustment. For these, I have
had in my plans to rearrange things such that e.g. float32_add is called from
TCG directly, with no other function calls at all.
For targets that cannot do that, I simply cannot bring myself to care about the
final percentage points enough to want to introduce extra macros.
Another thought re all of the soft_is_normal || soft_is_zero checks that you're
performing. I think it would be nice if we could work with
float*_unpack_canonical so that we don't have to duplicate work. E.g.
/* Return true for float_class_normal && float_class_zero. */
static inline bool is_finite(FloatClass c) { return c <= float_class_zero; }
float32 float32_add(float32 a, float32 b, float_status *s)
{
FloatClass a_cls = float32_classify(a);
FloatClass b_cls = float32_classify(b);
if (is_finite(a_cls) && is_finite(b_cls) && ...) {
/* do hardfp thing */
}
pa = float32_unpack(a, ca, s);
pb = float32_unpack(b, cb, s);
pr = addsub_floats(pa, pb, s, false);
return float32_round_pack(pr, s);
}
Where float32_classify produces Normal/Zero/Inf/NaN and might avoid duplicate
work within float32_unpack.
r~
[Qemu-devel] [PATCH v1 09/14] hostfloat: support float32/64 multiplication, Emilio G. Cota, 2018/03/21
[Qemu-devel] [PATCH v1 01/14] tests: add fp-bench, a collection of simple floating-point microbenchmarks, Emilio G. Cota, 2018/03/21
[Qemu-devel] [PATCH v1 11/14] hostfloat: support float32/64 fused multiply-add, Emilio G. Cota, 2018/03/21