qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v1 08/14] hostfloat: support float32/64 addition


From: Richard Henderson
Subject: Re: [Qemu-devel] [PATCH v1 08/14] hostfloat: support float32/64 addition and subtraction
Date: Thu, 22 Mar 2018 14:41:05 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

On 03/22/2018 01:57 PM, Emilio G. Cota wrote:
>> Is there any especially good reason you want to not put this code into the
>> normal softfloat function?  Does it really many any measurable difference at
>> all to force this code to be inlined into a helper?
> 
> You mean to do this? (... or see below)
> 
> --- a/fpu/hostfloat.c
> +++ b/fpu/hostfloat.c
> @@ -97,7 +97,7 @@ GEN_INPUT_FLUSH(float64)
> 
>  #define GEN_FPU_ADDSUB(add_name, sub_name, soft_t, host_t,              \
>                         host_abs_func, min_normal)                       \
> -    static inline __attribute__((always_inline)) soft_t                 \
> +    static soft_t                                                       \
>      fpu_ ## soft_t ## _addsub(soft_t a, soft_t b, bool subtract,        \
>                                float_status *s)                          \
>      {                                                                   \
> 
> That slows add/sub dramatically, because addsub is not inlined into
> float32_add and float32_sub (that's an extra function call plus an
> extra branch per emulated op).
> 
> For x86_64-linux-user/qemu-x86_64 tests/fp-bench -o add, the above gives
> - before: 188.06 MFlops
> - after:  117.56 MFlops

Well, not quite.  I meant putting all of the new code into softfloat.c, and not
attempting to inline any of it in target/cpu/foo_helper.c.

The best written target helpers currently simply tail-call to softfloat.c.
There are others that do so after minor argument adjustment.  For these, I have
had in my plans to rearrange things such that e.g. float32_add is called from
TCG directly, with no other function calls at all.

For targets that cannot do that, I simply cannot bring myself to care about the
final percentage points enough to want to introduce extra macros.

Another thought re all of the soft_is_normal || soft_is_zero checks that you're
performing.  I think it would be nice if we could work with
float*_unpack_canonical so that we don't have to duplicate work.  E.g.

/* Return true for float_class_normal && float_class_zero.  */
static inline bool is_finite(FloatClass c) { return c <= float_class_zero; }

float32 float32_add(float32 a, float32 b, float_status *s)
{
  FloatClass a_cls = float32_classify(a);
  FloatClass b_cls = float32_classify(b);

  if (is_finite(a_cls) && is_finite(b_cls) && ...) {
      /* do hardfp thing */
  }

  pa = float32_unpack(a, ca, s);
  pb = float32_unpack(b, cb, s);
  pr = addsub_floats(pa, pb, s, false);
  return float32_round_pack(pr, s);
}

Where float32_classify produces Normal/Zero/Inf/NaN and might avoid duplicate
work within float32_unpack.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]