[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] Gawk and NaN values
From: |
Nelson H. F. Beebe |
Subject: |
Re: [bug-gawk] Gawk and NaN values |
Date: |
Wed, 31 Aug 2011 12:52:32 -0600 (MDT) |
Hermann Peifer <address@hidden> notes the following behavior:
$ echo "1 -nan" | awk '{ print $1, $1 / $2 }' | awk '{print $1 / $2}'
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted
[where awk == gawk].
Here is what nawk and mawk do on GNU/Linux x86_64:
% echo "1 -nan" | nawk '{ print $1, $1 / $2 }' | nawk '{print $1 / $2}'
nan
% echo "1 -nan" | mawk '{ print $1, $1 / $2 }' | mawk '{print $1 / $2}'
nan
The Solaris SPARC /bin/awk (the original awk, before it
became nawk) produces
% echo "1 -nan" | awk '{ print $1, $1 / $2 }' | awk '{print $1 / $2}'
-nan
On Solaris, the other two produce -nan (nawk) and -NaN (mawk).
On the subject of NaNs, I have these comments.
(1) The IEEE 754 1985 and 2008 Standards leave the sign
implementation dependent (x86 and x86_64 hardware
produce negative NaNs, whereas most other platforms
produce positive NaNs: the sign has NO EFFECT on the
value's interpretation as a NaN).
(2) NaNs are defined to be either quiet or signaling,
but for historical reasons, x86 has only one kind, and that
flaw propagated into the Java and C# languages, and their
virtual machines.
(3) The simple test
if (x != x) print "x is a NaN"
should work in ALL programming languages (after
adjusting for syntax differences), but it is surprising
how many compilers botch it. Modern code should
therefore use a test function:
if (isnan(x)) print "x is a NaN"
The C99 Standard says this about NaN input conversion via
strtod(), strtof(), and strtold():
>> ...
>> The expected form of the subject sequence is an optional
>> plus or minus sign, then one of the following:
>> -- a nonempty sequence of decimal digits optionally
>> containing a decimal-point character, then an
>> optional exponent part as defined in 6.4.4.2;
>> -- a 0x or 0X, then a nonempty sequence of hexadecimal
>> digits optionally containing a decimal-point
>> character, then an optional binary exponent part as
>> defined in 6.4.4.2;
>> -- INF or INFINITY, ignoring case
>> -- NAN or NAN(n-char-sequenceopt), ignoring case in the NAN part, where:
>> n-char-sequence:
>> digit
>> nondigit
>> n-char-sequence digit
>> n-char-sequence nondigit
>> The subject sequence is defined as the longest initial
>> subsequence of the input string, starting with the first
>> non-white-space character, that is of the expected
>> form. The subject sequence contains no characters if the
>> input string is not of the expected form.
>> ...
>> A character sequence NAN or NAN(n-char-sequenceopt), is
>> interpreted as a quiet NaN, if supported in the return type,
>> else like a subject sequence part that does not have the
>> expected form; the meaning of the n-char sequences is
>> implementation-defined.
>> ...
Notice that the sign of an input NaN is always optional, and
scripting language implementations should certainly not
behave differently, because most are themselves implemented
in the C language.
The n-char-sequence is conventionally used to hold a
hexadecimal representation of the significand payload:
nan("0xdeadbeef")
qnan("0xbeefcafe")
snan("0xfeed_face_cafe_babe")
The 0x prefix is optional.
It is a good idea to recognize input
snan
snan(n-char-seq)
qnan
qnan(n-char-seq)
as well, and obey the payloads, if possible. A few systems
produce NaNQ and NaNS to distinguish between the quiet and
signaling forms, but QNaN and SNaN match the English names
better.
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- University of Utah FAX: +1 801 581 4148 -
- Department of Mathematics, 110 LCB Internet e-mail: address@hidden -
- 155 S 1400 E RM 233 address@hidden address@hidden -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------