qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub


From: Alex Bennée
Subject: Re: [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub
Date: Wed, 21 Oct 2020 18:46:36 +0100
User-agent: mu4e 1.5.6; emacs 28.0.50

Richard Henderson <richard.henderson@linaro.org> writes:

> Hi Alex,
>
> Here's my first adjustment to your conversion for 128-bit floats.
>
> The Idea is to use a set of macros and an include file so that we
> can re-use the same large chunk of code that performs the basic
> operations on various fraction lengths.  It's ugly, but without
> proper language support it seems to be less ugly than most.
>
> While I've just gone and added lots of stuff to int128... I have
> had another idea, half-baked because I'm tired and it's late:
>
> typedef struct {
>     FloatClass cls;
>     int exp;
>     bool sign;
>     uint64_t frac[];
> } FloatPartsBase;
>
> typedef struct {
>     FloatPartsBase base;
>     uint64_t frac;
> } FloatParts64;
>
> typedef struct {
>     FloatPartsBase base;
>     uint64_t frac_hi, frac_lo;
> } FloatParts128;
>
> typedef struct {
>     FloatPartsBase base;
>     uint64_t frac[4]; /* big endian word ordering */
> } FloatParts256;
>
> This layout, with the big-endian ordering, means that storage
> can be shared between them, just by ignoring the least significant
> words of the fraction as needed.  Which may make muladd more
> understandable.

Would the big-endian formatting hamper the compiler on x86 where it can
do extra wide operations?

I am still seeing a multi MFlop drop in performance when converting the
float128_addsub to the new code. If this allows the compiler to do
better on the code I can live with it.

>
> E.g.
>
> static void muladd_floats64(FloatParts128 *r, FloatParts64 *a,
>                             FloatParts64 *b, FloatParts128 *c, ...)
> {
>     // handle nans
>     // produce 128-bit product into r
>     // handle p vs c special cases.
>     // zero-extend c to 128-bits
>     c->frac[1] = 0;
>     // perform 128-bit fractional addition
>     addsub_floats128(r, c, ...);
>     // fold 128-bit fraction to 64-bit sticky bit.
>     r->frac[0] |= r->frac[1] != 0;
> }
>
> float64 float64_muladd(float64 a, float64 b, float64 c, ...)
> {
>     FloatParts64 pa, pb;
>     FloatParts128 pc, pr;
>
>     float64_unpack_canonical(&pa.base, a, status);
>     float64_unpack_canonical(&pb.base, b, status);
>     float64_unpack_canonical(&pc.base, c, status);
>     muladd_floats64(&pr, &pa, &pb, &pc, flags, status);
>
>     return float64_round_pack_canonical(&pr.base, status);
> }
>
> Similarly, muladd_floats128 would use addsub_floats256.
>
> However, the big-endian word ordering means that Int128
> cannot be used directly; so a set of wrappers are needed.
> If added the Int128 routine just for use here, then it's
> probably easier to bypass Int128 and just code it here.

Are you talking about all our operations? Will we still need to#ifdef
CONFIG_INT128 in the softfloat code?

>
> Thoughts?
>
>
> r~
>
>
> Richard Henderson (15):
>   qemu/int128: Add int128_or
>   qemu/int128: Add int128_clz, int128_ctz
>   qemu/int128: Rename int128_rshift, int128_lshift
>   qemu/int128: Add int128_shr
>   qemu/int128: Add int128_geu
>   softfloat: Use mulu64 for mul64To128
>   softfloat: Use int128.h for some operations
>   softfloat: Tidy a * b + inf return
>   softfloat: Add float_cmask and constants
>   softfloat: Inline float_raise
>   Test split to softfloat-parts.c.inc
>   softfloat: Streamline FloatFmt
>   Test float128_addsub
>   softfloat: Use float_cmask for addsub_floats
>   softfloat: Improve subtraction of equal exponent
>
>  include/fpu/softfloat-macros.h |  89 ++--
>  include/fpu/softfloat.h        |   5 +-
>  include/qemu/int128.h          |  61 ++-
>  fpu/softfloat.c                | 802 ++++++++++-----------------------
>  softmmu/physmem.c              |   4 +-
>  target/ppc/int_helper.c        |   4 +-
>  tests/test-int128.c            |  44 +-
>  fpu/softfloat-parts.c.inc      | 339 ++++++++++++++
>  fpu/softfloat-specialize.c.inc |  45 +-
>  9 files changed, 716 insertions(+), 677 deletions(-)
>  create mode 100644 fpu/softfloat-parts.c.inc


-- 
Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]