[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[epsilon-devel] Unboxed tagged floating-point data in C: a (new?) effici

From: Luca Saiu
Subject: [epsilon-devel] Unboxed tagged floating-point data in C: a (new?) efficient and portable solution
Date: Sat, 13 Apr 2019 12:52:18 +0200
User-agent: Gnus (Gnus v5.13), GNU Emacs, x86_64-unknown-linux-gnu


Experimenting with the problem of representing floating-point data in
dynamically-typed context I have discovered a technique I did not know,
and which might be new.
Please tell me what you think.

Of course I plan to apply the idea to JitterLisp, but the idea should be
reusable: any VM author requiring floating-point support will run into
the same problems, and Jitter should provide an API to cover such use

Such support does not exist yet: what you can see in
is currently specific to JitterLisp, but as you can see from the
regularity of the code that header can and should be machine-generated
from a specification of data types, with the common code moved to
, at this point still unfinished.
Support for generic boxed data in the current implementation is still
tentative---That will mature as well and of course will be an option to
handle floating-point data as well.  In this proposal I am showing a way
of handing tagged but *unboxed* floats or doubles.


The idea is based on using a few low-order bits of the mantissa in a
floating-point datum to hold type information.  In any practical system
this tag should take 3 or 4 bits; in case precision is very important I
think it could be brought down to 2 bits, at the cost of making tags for
other types bigger.
Assuming the IEEE 754 representation, a double mantissa has 53 bits (of
which one implicit), and a float mantissa has 24 bits (of which one
implicit).  Notice that in IEEE 754 the mantissa occupies low-order
bits, independently from the machine endiannes.

By reserving TAG_BIT_NO bits for the tag, the precision will be
reduced.  I believe (but if you are stronger at numerical analysis than
I am, which is not unlikely, I am interested in your feedback) that this
should not affect the casual use of floats for most applications, as
long as floats are used in the contexts where they are supposed to work.

I see two main kinds of code written for standard float or double data
breaking under this change:
a) using floats as for loop indices: when the index becomes large in
   absolute value, incrementing or decrementing it may not change the
b) numeric code relying on some convergence criterion based on two
   numbers getting closer to each other than a certain threshold: the
   minimal representable difference, with tagged data, will become
Of course the disruption will be weaker with double-precision than with
single-precision number, in either case.
I believe we can live with such limitations.
b) in particular should not be a huge concern, given that such low-level
primitives belong in a library, presumably written in C.  Writing a
cosine procedure to run on a VM is feasible, but probably not worth the
effort.  An primitive written in C would, of course, receive untagged
arguments and return untagged results: the C code would only see
standard floating-point data with full precision and every mantissa bit
available, internally.

Let UTYPE be an unsigned type (in practice jitter_uint) and FTYPE a
floating-point type of the same size.

We have to support two operations:
* tag,   taking a FTYPE and returning an UTYPE;
* untag, taking an UTYPE and returning a FUTYPE;

These operations are critical and must be very fast.  Since calling C
function is relatively expensive with advanced dispatches, C functions
should only be used as a last resort.

Let TAG be the tag value used for our unboxed tagged float type; of
course TAG must fit in TAG_BIT_NO bits.  TAGMASK is a bit mask covering
the tag bits.

#define TAG_BIT_NO 4
#define TAG        ((UTYPE) 15)
#define TAGMASK    ((UTYPE) (1 << TAG_BIT_NO) - 1)

Moving values between floating-point and integer registers

It is not directly possible to apply bitwise operations on
floating-point data in C, nor in any CPU I know -- the RISC-V
specification, justifying its choice of separate integer and
floating-point register banks, cites as counterexamples, I think,
PA-RISC and AMD29K.  I know neither architecture and they are probably
not terribly important today, but in any case those would be the "easy"
architectures to support.  Let me focus on the others.
"The other" architectures have separate integer and floating-point
registers, and require at least one instruction to move data between
them -- in fact the x87 even needs to go through memory; but on the x86_64,
with %xmm, %ymm and %zmm registers, on only has to pay one instruction to
move from floating-point to integer and one instruction for the other
I am not sure about the latency of such instructions in general, but I
expect it not be to higher than an L1d access.  If this is true then
this technique is superior to boxed floating-point data even
disregarding the memory allocation overhead.

The obvious way to represent this in C is by type punning on a union:

union conv
  UTYPE u;
  FTYPE f;

To implement tagging, one writes to the f field and reads from the u
field; for untagging, the converse.  In my tests this generates good

Now, of course we have to do the actual bit manipulation.


Untagging an UTYPE value requires to mask off the lowest TAG_BIT_NO
bits. Given UTYPE u, we could say
  (u & ~ TAGMASK)
but this solution, while correct, is not ideal on MIPS and SH, which
cannot express a bitwise and with a sign-extended immediate on arbitrary
registers; they would need to load the negated mask as a literal
beforehand, taking one more instruction.
But this alternative takes only one instruction on every architecture I
have checked, including MIPS and SH:
  (u - TAG)
This is correct, as long as u actually has the floating-point tag, and
compiles to just one sum or subtraction instruction with an immediate
argument.  The immediate would even fit in 8 bits, zero- or sign-extended.
(Incidentally, I am already using this same style of untagging elsewhere
in JitterLisp.)


Tagging is less efficient than untagging in the general case: it
requires to first mask off the lowest TAG_BIT_NO bits (a subtraction
does not work in this case, as the original bits to clear may have any
configuration, not necessarily TAG), and then or-ing TAG:
  ((u & ~ TAGMASK) | TAG)
If we choose TAG in a clever way, however, we can do better.

One possibility is having TAG = 0.  In this case untagging becomes just
  (u & ~ TAGMASK)
.  However a zero tag is convenient for fixnums, which will be
presumably more common and therefore more critical than floating-point
data.  Moreover, one choice of TAG yields even better code.  Let TAG be
a TAG_BIT_NO-bit bit mask of all ones:
#define TAG        TAGMASK
In this case tagging an UTYPE datum requires to bitwise-or a value,
overwriting any 0 or 1 in the same position:
  (u | TAGMASK)
This compiles to one instruction on every machine I have tried including
RISCs, the exception being (in the worst case) SH, whose or instruction
with an immediate only admits r0 as the source-and-target register.  The
instruction immediate fits in 8 bits, again, zero- or sign-extended.

Tagging and untagging with macros, using GNU C

I am not against using GNU C extensions in Jitter, when they are
available, and providing a slower fallback solution based on C functions
for portability; if this made non-GCC compilers look worse, I would
certainly not cry.

GNU C's statement expressions are the obvious way of hiding the presence
of the conversion union, and as such they were my first choice of a
solution.  This, for example, works perfectly:

#define untag_stmtexp(a)  \
  ({                      \
    union conv _c;        \
    _c.u = (a) - TAG;     \
    _c.f;                 \

Doing the same using Standard C only

I was surprised to notice that the macro could be redefined using
standard C only, as long as we have compound literals, designated
initializers and non-constant initializers.
Those are all standard in C99.

The idea is to create an initialized union conv object within an
expression returning a field of it (different from the one we
initialized) as the result.

It is, admittedly, less pretty, but standard:

#define untag_standard(a)  \
  (((union conv)           \
    {.u = (a) - TAG}).f)

However this complexity is quite easy to hide with macros:

#define f2u(a)                  \
 (((union conv) {.f = (a)}).u)

#define u2f(a)                  \
 (((union conv) {.u = (a)}).f)

And then:

#define untag(a)         \
  (u2f (f2u (a) - TAG))

#define tag(a)                           \
  (u2f (((f2u (a) & ~ TAGMASK) | TAG)))

This is very nice to read, relies on standard C only, and appears to
generate optimal code with GCC.

I am attaching the (dirty) test program I have been experimenting with.
It compiles and works with every architecture I have played with
(x86_64, i386, mips, powerpc, riscv, sh, alpha, where applicable on
different word size and endianness).  Alpha requires -mieee not to throw
a FPE signal on NaN, but this has nothing to do with the tagging scheme.

A nice property of this encoding is that special floating-point values
do not change "classification", as per fpclassify: infinity values
remains infinite, NaNs remain NaNs.  The sign never changes, of course,
as it is stored in the highest bit and is never touched by the tag.

Maybe I should not be as proud as I am of this idea; what do people use
these C99 features for, normally?  I had never considered them as an
alternative to GNU C statements-expressions (which I have always liked);
but they do seem to apply widely in this cases, particularly in code
as heavy in macros as Jitter is.

Is this trivial?  Is it a known idea already?  Please tell me what you


Luca Saiu
* GNU epsilon:           http://www.gnu.org/software/epsilon
* My personal web site:  http://ageinghacker.net

I support everyone's freedom of mocking any opinion or belief, no
matter how deeply held, with open disrespect and the same unrelented
enthusiasm of a toddler who has just learned the word "poo".
/* Written by Luca Saiu  http://ageinghacker.net
   I, the author, hereby place this code into the public domain
   up to the extent of the applicable law. */

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <fenv.h>
#include <limits.h>
#include <assert.h>
#include <signal.h>


# define FTYPE float
# define ITYPE int
# define FTYPE double
# define ITYPE long long
# define UTYPE unsigned ITYPE

#define TAG_BIT_NO 4
#define TAG        ((UTYPE) 15)
#define TAGMASK    ((UTYPE) (1 << TAG_BIT_NO) - 1)

union conv {
  FTYPE f;
  ITYPE i;
  UTYPE u;

print_bits_u (UTYPE a)
  int i;
  for (i = sizeof (UTYPE) * CHAR_BIT - 1; i >= 0; i --)
      UTYPE mask = 1LU << ((UTYPE) i);
      if (a & mask)
        printf ("1");
        printf ("0");

print_bits_f (FTYPE a)
  union conv c0;
  c0.f = a;
  print_bits_u (c0.u);

__attribute__ ((const))
untag__ (FTYPE a)
  union conv c;
  c.f = a;
  /* This is correct. */
  //c.u &= ~ ((1 << TAG_BIT_NO) - 1);

  /* This alternative is equally correct, assuming that the argument is actually
     tagged, as it should be, and faster on MIPS and SH.

     On MIPS the immediate variant of the bitwise-anding instruction takes a
     zero-extended, rather than sign-extended, argument; therefore it cannot be
     used here where the mask would need sign-extension to fit in a 16-bit
     immediate.  GCC solves this the obvious way, by loading a literal into a
     temporary with one separate instruction before and-ing two registers into
     another register.  On the other hand, any sensible architecture, MIPS
     included, has a sum or subtract instruction taking a sign-extended literal.

     SH is even more restricted: it allows and-ing an immediate (again,
     zero-extended) only when the operand (source and destination) is r0.
     Replacing the and with an add (sign-extended immediate, no register
     restriction) solves the problem. */
  c.u -= TAG;
  return c.f;

/* This is correct in Standard C99, relying on compound literals, designated
   initializers and non-constant initializers, but not on GNU C expression
   statements. */

#define f2u(a) \
 (((union conv) {.f = (a)}).u)
#define u2f(a) \
 (((union conv) {.u = (a)}).f)

#define untag_alt(a) \
  ((union conv) \
   {.u = ((union conv) \
          {.f = (a)}.u - TAG) \

#define untag_stmtexp(a) \
  ({ \
    union conv _c; \
    _c.f = (a); \
    _c.u &= ~ TAGMASK; \
    _c.f; \

#define untag(a) \
  (u2f (f2u (a) - TAG))
#define tag(a) \
  (u2f (((f2u (a) & ~ TAGMASK) | TAG)))

__attribute__ ((const))
tag_ (FTYPE a)
  return tag (a);

__attribute__ ((const))
untag_ (FTYPE a)
  return untag_alt (a);

__attribute__ ((const))
tag__ (FTYPE a)
  union conv c;
  c.f = a;
  c.u &= ~ ((1 << TAG_BIT_NO) - 1);
  c.u |= TAG;
  return c.f;

void print_float (FTYPE a)
  printf ("%-26.15f ", a);
  print_bits_f (a);
  //feclearexcept (FE_ALL_EXCEPT);
  if (! iscanonical (a))
    printf (" NC");
  if (! isfinite (a) && ! isnan (a))
    printf (" I");
  //if (isnormal (a))
  //  printf (" n");
  if (issignaling (a))
    printf (" S");
  if (issubnormal (a))
    printf (" s");
  if (isnan (a))
    printf (" N");
  if (iszero (a))
    printf (" Z");
fflush (stdout);

__attribute__ ((noinline, noclone))
test (FTYPE a)
  //assert (untag (tag (a)) == untag (a));
  FTYPE t = untag (tag (a));
  printf ("*   "); print_float (a); printf ("\n");
  printf ("  ->"); print_float (t); printf ("\n");
  if (isfinite (a)
      && isfinite (t)
      && a != 0)
      if ((a - t) != 0)
        printf ("      (error: %f%%)\n", fabs ((a - t) * 100.0 / a));
        printf ("      (error: zero)\n");

main (void)
  printf ("FE_SNANS_ALWAYS_SIGNAL is defined.\n");
#endif // #ifdef FE_SNANS_ALWAYS_SIGNAL
  assert (sizeof (FTYPE) == sizeof (ITYPE));
  assert (sizeof (FTYPE) == sizeof (UTYPE));

  // FIXME: with POSIX threads I should use pthread_sigmask instead of 
  sigset_t signal_set;
  sigprocmask (0, NULL, & signal_set);
  printf ("SIGNAL MASK BEFORE: %lu\n", * (unsigned long *) &signal_set);
  sigaddset (& signal_set, SIGFPE);
  printf ("SIGNAL MASK AFTER:  %lu\n", * (unsigned long *) &signal_set);
  sigprocmask (SIG_BLOCK, & signal_set, NULL);

  test (0.0);
  test (- 0.0);
  test (1.0);
  test (- 1.0);
  test (M_PI);
  test (- M_PI);
  test (INFINITY);
  test (- INFINITY);
  test (NAN);
  test (SNAN);
  test (1234567890.);
  test (123456789.0);
  test (12345678.90);
  test (1234567.890);
  test (123456.7890);
  test (12345.67890);
  test (1234.567890);
  test (123.4567890);
  test (12.34567890);
  test (1.234567890);
  test (.1234567890);
  test (.01234567890);
  test (.001234567890);
  test (.0001234567890);
  test (.00001234567890);
  test (.000001234567890);
  test (.0000001234567890);
  test (.00000001234567890);
  test (.000000001234567890);
  test ((1 << 23) + (1 << TAG_BIT_NO) - 1); // worst possible relative error on 
32-bit floats
  test (u2f(-1));
  return 0;

Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]