[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC PATCH 14/30] softfloat: 16 bit helpers for shr, cl
From: |
Richard Henderson |
Subject: |
Re: [Qemu-devel] [RFC PATCH 14/30] softfloat: 16 bit helpers for shr, clz and rounding and packing |
Date: |
Sun, 15 Oct 2017 11:02:45 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 |
On 10/13/2017 09:24 AM, Alex Bennée wrote:
> Half-precision helpers for float16 maths. I didn't bother hand-coding
> the count leading zeros as we could always fall-back to host-utils if
> we needed to.
>
> Signed-off-by: Alex Bennée <address@hidden>
> ---
> fpu/softfloat-macros.h | 39 +++++++++++++++++++++++++++++++++++++++
> fpu/softfloat.c | 21 +++++++++++++++++++++
> 2 files changed, 60 insertions(+)
>
> diff --git a/fpu/softfloat-macros.h b/fpu/softfloat-macros.h
> index 9cc6158cb4..73091a88a8 100644
> --- a/fpu/softfloat-macros.h
> +++ b/fpu/softfloat-macros.h
> @@ -89,6 +89,31 @@ this code that are retained.
> # define SOFTFLOAT_GNUC_PREREQ(maj, min) 0
> #endif
>
> +/*----------------------------------------------------------------------------
> +| Shifts `a' right by the number of bits given in `count'. If any nonzero
> +| bits are shifted off, they are ``jammed'' into the least significant bit of
> +| the result by setting the least significant bit to 1. The value of `count'
> +| can be arbitrarily large; in particular, if `count' is greater than 16, the
> +| result will be either 0 or 1, depending on whether `a' is zero or nonzero.
> +| The result is stored in the location pointed to by `zPtr'.
> +*----------------------------------------------------------------------------*/
> +
> +static inline void shift16RightJamming(uint16_t a, int count, uint16_t *zPtr)
> +{
> + uint16_t z;
> +
> + if ( count == 0 ) {
> + z = a;
> + }
> + else if ( count < 16 ) {
> + z = ( a>>count ) | ( ( a<<( ( - count ) & 16 ) ) != 0 );
> + }
> + else {
> + z = ( a != 0 );
> + }
> + *zPtr = z;
> +
> +}
When are you going to use a SRJ of a uint16_t? Isn't most of your actual
arithmetic actually done on uint32_t?
> +/*----------------------------------------------------------------------------
> +| Returns the number of leading 0 bits before the most-significant 1 bit of
> +| `a'. If `a' is zero, 16 is returned.
> +*----------------------------------------------------------------------------*/
> +
> +static int8_t countLeadingZeros16( uint16_t a )
> +{
> + if (a) {
> + return __builtin_clz(a);
> + } else {
> + return 16;
> + }
> +}
__builtin_clz works on "int". You need to use clz32(a) - 16.
> +/*----------------------------------------------------------------------------
> +| Takes an abstract floating-point value having sign `zSign', exponent
> `zExp',
> +| and significand `zSig', and returns the proper single-precision floating-
s/single/half/
> +| point value corresponding to the abstract input. This routine is just like
> +| `roundAndPackFloat32' except that `zSig' does not have to be normalized.
> +| Bit 15 of `zSig' must be zero, and `zExp' must be 1 less than the ``true''
> +| floating-point exponent.
> +*----------------------------------------------------------------------------*/
> +
> +static float16
> + normalizeRoundAndPackFloat16(flag zSign, int zExp, uint16_t zSig,
> + float_status *status)
> +{
> + int8_t shiftCount;
> +
> + shiftCount = countLeadingZeros16( zSig ) - 1;
> + return roundAndPackFloat16(zSign, zExp - shiftCount, zSig<<shiftCount,
> + true, status);
Do I recall correctly that your lsb is between bits 7:6, like
roundAndPackFloat32? You've got 11 bits of sig. Plus 7 bits of extra equals
18 bits. Which doesn't fit in uint16_t.
So, the reason that roundAndPackFloat32 uses 7 bits is that 7 + 24 == 31.
We can either use a split at (15 - 11 =) 4 bits, and still fit in a uint16_t,
or we can drop uint16_t and admit that the compiler is going to promote to int,
or uint32_t, anyway. If we do that, we have options of a split between 4 and
(31 - 11 =) 20 bits.
We talked this week re fp->int conversion, it did seem Really Useful when we
noted that sig << exp is representable in a uint32_t. Which does suggest a
choice at or below (32 - 11 - 14 =) 7.
r~
- [Qemu-devel] [RFC PATCH 20/30] softfloat: half-precision compare functions, (continued)
- [Qemu-devel] [RFC PATCH 20/30] softfloat: half-precision compare functions, Alex Bennée, 2017/10/13
- [Qemu-devel] [RFC PATCH 25/30] softfloat: float16_round_to_int, Alex Bennée, 2017/10/13
- [Qemu-devel] [RFC PATCH 27/30] target/arm/translate-a64.c: add FP16 FRINTP to 2 reg misc, Alex Bennée, 2017/10/13
- [Qemu-devel] [RFC PATCH 30/30] target/arm/translate-a64.c: add FP16 FCVTPS to 2 reg misc, Alex Bennée, 2017/10/13
- [Qemu-devel] [RFC PATCH 28/30] softfloat: float16_to_int16 conversion, Alex Bennée, 2017/10/13
- [Qemu-devel] [RFC PATCH 21/30] target/arm/translate-a64: add FP16 2-reg misc compare (zero), Alex Bennée, 2017/10/13
- [Qemu-devel] [RFC PATCH 24/30] disas_simd_indexed: support half-precision operations, Alex Bennée, 2017/10/13
- [Qemu-devel] [RFC PATCH 14/30] softfloat: 16 bit helpers for shr, clz and rounding and packing, Alex Bennée, 2017/10/13
- Re: [Qemu-devel] [RFC PATCH 14/30] softfloat: 16 bit helpers for shr, clz and rounding and packing,
Richard Henderson <=
- [Qemu-devel] [RFC PATCH 19/30] Fix mask for AdvancedSIMD 2 reg misc, Alex Bennée, 2017/10/13
- [Qemu-devel] [RFC PATCH 16/30] target/arm/translate-a64.c: add FP16 FADD/FMUL/FDIV to AdvSIMD 3 Same (!sub), Alex Bennée, 2017/10/13
- Re: [Qemu-devel] [RFC PATCH 00/30] v8.2 half-precision support (work-in-progress), no-reply, 2017/10/13
- Re: [Qemu-devel] [RFC PATCH 00/30] v8.2 half-precision support (work-in-progress), no-reply, 2017/10/14
- Re: [Qemu-devel] [RFC PATCH 00/30] v8.2 half-precision support (work-in-progress), Richard Henderson, 2017/10/16