On 12/03/2015 12:12 AM, Richard Fateman
This is a great example! comes in handy when discussing precision
and accuracy in FP calculations. I have often seen people who do
numerical simulations believing that double-precision solves the
accuracy problems. I have written the following to some of my
friends; please forgive the parts that are obvious to people on this
list who do this stuff for living:
From S. Rump's investigations..
a "polynomial" that evaluates the same in IEEE single or double
precision, about 1.1726 but is wrong in both
Over the years I discussed with all of you the pitfalls
of precision and accuracy in floating point calculations. I just
ran into a wonderful illustration how tricky this business is, so
I am sharing it with you.
As you know, each of several available FP formats is precise to a
certain number of significant digits: six for 32-bit single
precision, fifteen for 64-bit double precision, etc. What this
means is that, of course, a result obtained in single precision FP
arithmetic cannot be more accurate that the format's precision,
i.e. 0.001%---but often the errors accumulate and the overall
accuracy is further lost due to rounding, faulty normalization and
other effects. If we didn't have this accuracy loss, we could do
our calculations with an actually existing half-precision FP
numbers that have 3 significant digits of precision, and are very
space-efficient (16 bits) and fast. Unfortunately, even though
most of our measurements would be accurately represented within
this 0.1% precision, the loss of accuracy in most calculations
makes this format impractical.
Normally, we deal with the accuracy loss by accepting the price of
calculating in higher precision: it's virtually impossible to get
inaccurate result when we start from the double precision's
0.00000000000001%, right? Unfortunately, it turns out that it IS
possible, even if rarely---and we can correct for that by doing
something similar to Richardson's extrapolation ( https://en.wikipedia.org/wiki/Richardson_extrapolation
): obtain the result in single precision, and then in double
precision. If the result is the same, we might be assured that
we're not losing accuracy. If they are NOT the same, we could
conclude that the double precision is the right one, or, if we're
in an exacting mood, we could actually estimate the result for
Unfortunately, math is a harsh mistress and this doesn't always
work. Here's an example of a simple polynomial (you don't need any
muti-hour iterative calculations) where the accuracy loss is so
severe that we're getting a wrong result for both single and
Doubly unfortunately, the wrong result is THE SAME for both, so it
turns out that you can't even rely on comparing multiple
here's where to evaluate it, and here's the right answer...
float(rp(77617, 33096)); evaluates to -0.82739605994682
But changing the floating point precision gives different answers...
for fpprec:10 step 5 thru 50 do print (rp(77617.0b0, 33096.0b0));
google will find you more info about Rump's polynomial..
using interval arithmetic with double-floats tells you that the correct
in [- 2.7363628282260744E+22, 2.458240415520526E+22]
which is why it might make sense to use more precision.