[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: asserting the equality of double values

From: Nelson H. F. Beebe
Subject: Re: asserting the equality of double values
Date: Sat, 15 Sep 2007 13:20:22 -0600 (MDT)

Kamaraju S Kusumanchi asks today:

>> ...
>>       x = ldexp (1.0, DBL_MANT_DIG) - 1.0;
>>       assert (x == floor (x));      /* should be an integer already */
>> here ldexp and floor both return double values. Is it guaranteed that
>> asserting the equality of two double values will always work? When learning
>> C, I remember reading somewhere not to rely on such comparisons.
>> ...

floor() returns a double that contains a whole number representing the
integer part of its argument.  Since a double is a 64-bit quantity on
most modern machines, and in the IEEE 754 floating-point system, the
64-bit format contains a 53-bit significand, the integer part of a
double is too big to store as a 32-bit int type in C.

The comparision x == floor(x) is perfectly okay.  Floating-point
comparisons are NOT fuzzy: they are exact.  If x contained a whole
number on entry, then floor(x) is bit-for-bit identical to it.

Regrettably, there is a lot of misinformation about floating-point
arithmetic promulgated by book authors and programs who lack
sufficient understanding of the subject.

If you do something like

        x = 1.0;
        y = 3.0;
        if (3.0 * (x/y) == 1.0) ...

you cannot expect the equality to be true everywhere, because 1/3 is
not exactly representable in a finite number of bits in a binary
arithmetic system.  This particular example evaluates to true (1) with
IEEE 754 arithmetic in default rounding, but not with other rounding

However, the point in the sample code at the start of this message is
quite different: it uses ldexp(u,v) to construct a number u * 2**v,
and since u = 1.0 here, the result is 2**v, an EXACTLY-REPRESENTABLE
number.  Since DBL_MANT_DIG = 53, the result is the number 2**53 =
9_007_199_254_740_992.  Subtracting one produces a significant of all
1-bits:  2**53 - 1 = +0x1.fffffffffffffp+52, and that is the
next-to-largest exactly representable whole number in this arithmetic
system.  Its floor is identical, so "x == floor()" is true.

On some historical floating-point designs, this code might not work
like it does in IEEE 754 arithmetic, because of different rounding

- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: address@hidden  -
- 155 S 1400 E RM 233                       address@hidden  address@hidden -
- Salt Lake City, UT 84112-0090, USA    URL: -

reply via email to

[Prev in Thread] Current Thread [Next in Thread]