[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gawk infinity issues

From: Andrew J. Schorr
Subject: Re: gawk infinity issues
Date: Fri, 6 Jan 2006 20:01:23 -0500
User-agent: Mutt/1.4.1i

On Fri, Jan 06, 2006 at 04:06:36PM -0800, Paul Eggert wrote:
> The point is that if gawk now
> converts the string "INF" to an infinite number, rather than to zero
> as was historical practice, then the behavior of gawk has changed, and
> that might break some scripts.

How do you establish what is "historical practice"?  For example,
on Solaris 8 sparc, with bundled nawk, I see this:

   $ nawk 'BEGIN {x = "inf"; print x+0}'
   $ nawk 'BEGIN {x = "-inf"; print x+0}'
   $ nawk 'BEGIN {x = "nan"; print x+0}'
   $ nawk 'BEGIN {x = "-nan"; print x+0}'

And with gawk 3.1.5 I see this:

   $ ./gawk 'BEGIN {x = "inf"; print x+0}'
   $ ./gawk 'BEGIN {x = "-inf"; print x+0}'
   $ ./gawk 'BEGIN {x = "nan"; print x+0}'
   $ ./gawk 'BEGIN {x = "-nan"; print x+0}'

Whereas linux gawk converts everything to 0 (because it uses gawk_strtod
which does not understand Inf or Nan).

> Until they get resolved, it might be best to leave gawk alone.

I respectfully disagree.  It seems to me that gawk's current behavior is not
well defined.  It seems to be an artifact of how force_number is coded.  But I
don't think the code in force_number anticipated the possibility that the
string could contain nan or inf.  Hence the somewhat random behavior (does it
strike anybody as sensible that on solaris 8, "-Inf" is treated as a numeric
value, but "Inf" is converted to 0?).  You make it sound as if gawk has
historically and consistently converted these strings to a numeric value of 0,
but as you can see above, that is not the case.

Furthermore, if we allow gawk to perform calculations that give NaN or Inf
results, it seems rather shoddy to me not to be able to convert those values
back and forth between string and numeric representations.  The IEEE spec
established NaN and Inf values for a reason (because they are useful), and it
limits gawk's utility if it cannot handle these values in a reasonable and
consistent fashion.

I think that there are basically 2 sound choices here: 1. do not support Inf
and Nan, and convert all such strings to 0 when encountered in a numeric
expression; or 2. recognize Inf and Nan as in the strtod spec.  Current gawk
does not implement either of these approaches, so I claim it is flawed.  I
would vote heavily for #2, but could understand that others might prefer #1.
But if there is disagreement, I might suggest a command-line switch (comparable
to --non-decimal-data) to control this behavior.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]