bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Invalid result when converting to hex literal


From: Nelson H. F. Beebe
Subject: Re: [bug-gawk] Invalid result when converting to hex literal
Date: Tue, 20 Mar 2018 10:36:40 -0600

Arnold Robbins responds today about the handling of hexadecimal
constants in gawk:

>> See
>>      
>> https://www.gnu.org/software/gawk/manual/html_node/POSIX-Floating-Point-Problems.html#POSIX-Floating-Point-Problems

I understand Arnold's serious reservations about breaking code, and
introducing significant behavioral changes, from that feature.

However, hexadecimal floating-point constants were introduced into C99
for an extremely important reason: they allow exact specification of
floating-point bit patterns that could not be reliably obtained with
either decimal input, or with C library function calls, or with
recognizable rational numbers, because there is no guarantee that I/O
base conversions, and library functions, and floating-point division,
always produce correctly-rounded results.

For example, if you want the machine epsilon in IEEE 754 64-bit binary
format, you can write it as epsilon = 0x1p-52: that is the smallest positive
number such that (1 + epsilon) != epsilon.  In decimal, its value is

        epsilon = 2.220_446_049_250_313_080_847_263_336_181_640_84...e-16

Similarly, the smallest normal number is obtained from minnormal =
0x1p-1022, and the smallest subnormal as minsubnormal = 0x1p-1074.
Their decimal values are approximately
        
        minnormal = 2.225_073_858_507_201_383_090_232_717_332_404_06...e-308
        minsubnormal = 4.940_656_458_412_465_441_765_687_928_682_213_66...e-324

The correctly rounded value of the famous mathematical constant PI is
x = +0x1.921fb54442d18p+1.  Its decimal value is likely more familiar:

        pi = 3.141_592_653_589_793_238_462_643_383_279_502_...

Those examples all have rather long exact decimal representations
(e.g. 718 fractional decimal digits for minnormal), except in the case
of the transcendental constant PI, for which there is no finite
representation in any integer base.

For accurate floating-point computation, it is imperative that you be
able to write constants that are exactly representable, or else are
correctly rounded to working precision, so when C99 introduced them,
we in the numerical analysis community were very happy to finally have
them, when almost no other programming language before then had
offered such a facility.

The gawk manual says

   * The 'gawk' maintainer feels that supporting hexadecimal
     floating-point values, in particular, is ugly, and was never
     intended by the original designers to be part of the language.

    ...

       Recognizing these issues, but attempting to provide compatibility
    with the earlier versions of the standard, the 2008 POSIX standard added
    explicit wording to allow, but not require, that 'awk' support
    hexadecimal floating-point values and special values for "not a number"
    and infinity.
    ...

    Hexadecimal floating point is not supported (unless you also use
     '--non-decimal-data', which is _not_ recommended).

Let us try the latter suggestion with gawk-4.2.1:

        % gawk --non-decimal-data 'BEGIN {print 0x1p5}'
        1

        % gawk --non-decimal-data 'BEGIN {print 0x1.p5}'
        1

        % gawk --non-decimal-data 'BEGIN {print 0x1.0p5}'
        1

        % gawk --non-decimal-data 'BEGIN {print 0x.1p5}'
        00.1

None of those matches what the C language, and my hoc implementation,
produce:

        hoc> 0x1p5 ; 0x1.p5 ; 0x1.0p5 ; 0x.1p5
         32
         32
         32
         2

The answer 2 in the last case is correct, because a power pNNN in
hexadecimal floating-point means 2-to-the-power-NNN, NOT
16-to-the-power-NNN, so we have

        (1/16) * 2**5 = 2**1 = 1.

It seems to me that hexadecimal floating point values are still not
correctly supported in gawk, even though the manual suggests that they
SHOULD BE when the --non-decimal-data option is supplied.

At least the three major awk implementations behave consistently: they
all ignore nonnumeric suffixes without raising an error, and silently
skip to the next recognizable lexical token:

        % gawk 'BEGIN {print 12345foolish}'
        12345

        % mawk 'BEGIN {print 12345foolish}'
        12345

        % nawk 'BEGIN {print 12345foolish}'
        12345

        % gawk 'BEGIN {print 1.2345foolish}'
        1.2345

        % mawk 'BEGIN {print 1.2345foolish}'
        1.2345

        % nawk 'BEGIN {print 1.2345foolish}'
        1.2345

        % gawk 'BEGIN {print 12345foolish, 6789barking}'
        12345 6789

        % mawk 'BEGIN {print 12345foolish, 6789barking}'
        12345 6789

        % nawk 'BEGIN {print 12345foolish, 6789barking}'
        12345 6789

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: address@hidden  -
- 155 S 1400 E RM 233                       address@hidden  address@hidden -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]