lmi
[Top][All Lists]

## [lmi] Two kinds of precision loss [Was: Is bourn_cast demonstrably corre

 From: Greg Chicares Subject: [lmi] Two kinds of precision loss [Was: Is bourn_cast demonstrably correct?] Date: Tue, 21 Mar 2017 22:34:35 +0000 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.6.0

```On 2017-03-20 23:56, Greg Chicares wrote:
[...]
> there are two problems in mapping between floating and integral types:
>
>   (1) loss of range: e.g., DBL_MAX --> int, where static_cast produces
> undefined behavior [4.9/1] and definitely gives a wrong answer, because
> there really is no right answer;
>
>   (2) loss of precision: e.g., M_PI --> int, or ULLONG_MAX --> float,
> where static_cast gives a well-defined result [4.9/2] that is as good
> an approximation as it can be, plus or minus one ulp;

That's ambiguous now that I re-read it: do the last two lines apply to
both examples, or only the second example? Let me restate it so that I

There are three problems in mapping between floating and integral types:

(1) Loss of range, e.g., DBL_MAX --> int: UB; definitely not wanted.

(2) Truncation, e.g., M_PI --> int: we agree that we don't want this.
Another term could be quantization, and a coarse quantization it is:
any integral type, no matter how wide, can only give "3" as the answer.

(3) Loss of precision, e.g., ULLONG_MAX --> float: here, static_cast
gives a well-defined result [4.9/2] that is as good an approximation
as it can be, plus or minus one ulp; a sufficiently wide hypothetical
"long long double" type could be exact. Is this okay for bourn_cast?

Let me share a curious example that arose when I tried to write a unit
test for this exact case. I'm using
float = 32-bit IEEE 754
unsigned long long = 64-bit integer

snprintf() results:

18446744073709551615 == 2^64 - 1 = ULLONG_MAX
18446744073709551616 == static_cast<float>(ULLONG_MAX)
^different only in the last digit shown

Cast either to the type of the other, and they compare equal. I think
I'll wait until April Fools' Day and report this here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323

But seriously...we agree that bourn_cast should throw in case (1)
above, and should throw also in case (2); but in case (3), should it
return float(ULLONG_MAX)?

Initially at least, I think the answer should be "yes". Otherwise,
a double cannot be converted to float unless its last (53-24)
mantissa bit are all zero, which has a 1 / 536870912 probability
assuming a uniform distribution. OTOH, I guess I'm saying I like
the Carpenter better than the Walrus because instead of truncating
he represented as much precision as he could in his 32 bits, and
that argument does seem subjective.

```