Re: [bug-gawk] Gawk and NaN values

bug-gawk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Gawk and NaN values

From:	Nelson H. F. Beebe
Subject:	Re: [bug-gawk] Gawk and NaN values
Date:	Wed, 31 Aug 2011 12:52:32 -0600 (MDT)

Hermann Peifer <address@hidden> notes the following behavior:

$ echo "1 -nan" | awk '{ print $1, $1 / $2 }' | awk '{print $1 / $2}'
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted

[where awk == gawk].

Here is what nawk and mawk do on GNU/Linux x86_64:

% echo "1 -nan" | nawk '{ print $1, $1 / $2 }' | nawk '{print $1 / $2}'
nan

% echo "1 -nan" | mawk '{ print $1, $1 / $2 }' | mawk '{print $1 / $2}'
nan

The Solaris SPARC /bin/awk (the original awk, before it
became nawk) produces

%  echo "1 -nan" | awk '{ print $1, $1 / $2 }' | awk '{print $1 / $2}'
-nan

On Solaris, the other two produce -nan (nawk) and -NaN (mawk).

On the subject of NaNs, I have these comments.

(1) The IEEE 754 1985 and 2008 Standards leave the sign
    implementation dependent (x86 and x86_64 hardware
    produce negative NaNs, whereas most other platforms
    produce positive NaNs: the sign has NO EFFECT on the
    value's interpretation as a NaN).

(2) NaNs are defined to be either quiet or signaling,
    but for historical reasons, x86 has only one kind, and that
    flaw propagated into the Java and C# languages, and their
    virtual machines.

(3) The simple test

            if (x != x) print "x is a NaN"

    should work in ALL programming languages (after
    adjusting for syntax differences), but it is surprising
    how many compilers botch it.  Modern code should
    therefore use a test function:

            if (isnan(x)) print "x is a NaN"

The C99 Standard says this about NaN input conversion via
strtod(), strtof(), and strtold():

>> ...
>> The expected form of the subject sequence is an optional
>> plus or minus sign, then one of the following:
>>     -- a nonempty sequence of decimal digits optionally
>>        containing a decimal-point character, then an
>>        optional exponent part as defined in 6.4.4.2;
>>     -- a 0x or 0X, then a nonempty sequence of hexadecimal
>>        digits optionally containing a decimal-point
>>        character, then an optional binary exponent part as
>>        defined in 6.4.4.2;
>>     -- INF or INFINITY, ignoring case
>>     -- NAN or NAN(n-char-sequenceopt), ignoring case in the NAN part, where:
>>                n-char-sequence:
>>                       digit
>>                       nondigit
>>                       n-char-sequence digit
>>                       n-char-sequence nondigit
>>     The subject sequence is defined as the longest initial
>>     subsequence of the input string, starting with the first
>>     non-white-space character, that is of the expected
>>     form. The subject sequence contains no characters if the
>>     input string is not of the expected form.
>> ...
>> A character sequence NAN or NAN(n-char-sequenceopt), is
>> interpreted as a quiet NaN, if supported in the return type,
>> else like a subject sequence part that does not have the
>> expected form; the meaning of the n-char sequences is
>> implementation-defined.
>> ...

Notice that the sign of an input NaN is always optional, and
scripting language implementations should certainly not
behave differently, because most are themselves implemented
in the C language.

The n-char-sequence is conventionally used to hold a
hexadecimal representation of the significand payload:

        nan("0xdeadbeef")
        qnan("0xbeefcafe")
        snan("0xfeed_face_cafe_babe")

The 0x prefix is optional.

It is a good idea to recognize input

        snan
        snan(n-char-seq)
        qnan
        qnan(n-char-seq)

as well, and obey the payloads, if possible.  A few systems
produce NaNQ and NaNS to distinguish between the quiet and
signaling forms, but QNaN and SNaN match the English names
better.

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: address@hidden  -
- 155 S 1400 E RM 233                       address@hidden  address@hidden -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------

[Prev in Thread]

Current Thread

[Next in Thread]

[bug-gawk] Gawk and NaN values, Hermann Peifer, 2011/08/31
- Re: [bug-gawk] Gawk and NaN values, Nelson H. F. Beebe <=
  - Re: [bug-gawk] Gawk and NaN values, Hermann Peifer, 2011/08/31

Prev by Date: [bug-gawk] Gawk and NaN values
Next by Date: Re: [bug-gawk] Gawk and NaN values
Previous by thread: [bug-gawk] Gawk and NaN values
Next by thread: Re: [bug-gawk] Gawk and NaN values
Index(es):
- Date
- Thread