bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Interesting floating point behavior


From: Nelson H. F. Beebe
Subject: Re: [bug-gawk] Interesting floating point behavior
Date: Fri, 20 Jan 2012 10:20:20 -0700 (MST)

Robert Kennedy <address@hidden> reports puzzlement over the
awk computation "a=$1; b=a*10000; c=b%100" that produces these values:

>> 0.69     6900    100

Here is what is happening:

In 128-bit binary IEEE 754 arithmetic:

        hoc128> a = 0.69
        hoc128> b = a * 10000
        hoc128> c = b % 100
        hoc128> println a, b, c
         0.69  6_899.999_999_999_999_999_999_999_999_999_998_42  99

In 128-bit decimal IEEE 754 arithmetic:

        hocd128> a = 0.69
        hocd128> b = a * 10000
        hocd128> c = b % 100
        hocd128> println a, b, c
         0.69  6_900 0

Because most decimal fractions, like 0.69, are not exactly
representable in binary arithmetic, you often see the string-of-9s
phenomenon when you do the inexact round-trip decimal -> binary ->
decimal.

This has nothing to do with gawk: it is a fact of life that arises
from inexact base conversion.

A famous example used to illustrate the need for decimal arithmetic is
sales tax computation: 5% on a purchase of $0.70: in decimal
arithmetic, the answer is 1.05 * 0.70 = $0.735, and tax man's rounding
says you owe $0.74.  

In binary arithmetic, 0.70 is not exactly representable, no matter
what your precision is, and the computation produces
0.734_999_999_999_999_99, which rounds down to 0.73, cheating the tax
authorities of 0.01.  

They DO care about this, and in most jurisdictions, such arithmetic
MUST be done in decimal.

While a difference of a penny is insignificant when you buy a Ferrari,
it can add up to millions of dollars annually in businesses that have
large numbers of small transactions, like telephone companies and
grocery stores.

IEEE 754-2008, the revision of IEEE 754-1985, includes decimal
arithmetic, and additional rounding rules demanded by tax laws (e.g.,
round-ties-upward: 0.735 -> 0.74).  So far, only IBM z-Series and IBM
PowerPC 7 have full support of the 2008 standard.

I have versions of mawk and nawk that use decimal arithmetic instead
of binary arithmetic: for them, Robert's experiment produces this
output:

        echo -e "Input\t*10000\t%100"; \
        for i in 0.67 0.68 0.69 0.70; do \
            echo $i | dmawk '{a=$1; b=a*10000; c=b%100; print 
a,"\t",b,"\t",c}'; \
        done

        Input   *10000  %100
        0.67     6700    0
        0.68     6800    0
        0.69     6900    0
        0.70     7000    0

They are built with the 128-bit decimal format, which supplies exactly
34 decimal digits.  Here is a computation of the machine epsilon,
which should be 10**(-34 + 1):

        % dmawk -f macheps.awk 
        machine epsilon = 1e-33 = 10**-33
        machine epsilon = 1e-33 = 10**-33

Versions of gcc with support for decimal arithmetic, and binary
packages with hoc, dmawk, dnawk, and dlua are available here:

        http://www.math.utah.edu/pub/mathcw/

My large book on that library is essentially done, with some minor
tweaks in progress before going to the publisher.

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: address@hidden  -
- 155 S 1400 E RM 233                       address@hidden  address@hidden -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]