Re: Accurate reading of floating point numbers

Hi Bruno,

Am Mo., 26. Aug. 2024 um 12:37 Uhr schrieb Bruno Haible <bruno@clisp.org>:

Hi Marc,

> Gnulib
> offers strtod replacements, but these refer to conversion routines of the
> host systems and may lack the accuracy described in Clinger's well-known
> paper [1] IIUC.
>
> Wouldn't it make sense to include an accurate conversion algorithm in
> Gnulib like [2] with the small improvement from [3]?

If you have time to do so, and would like to contribute decent unit tests
accompanying the code, please do so!

Should I find the time and be able to write it, I will do so, but see below.

> Maybe code from glibc can be reused.

The problem with the glibc code is that they have specialized code for each
format:
- IEEE 754 single-precision,
- IEEE 754 double-precision,
- 'long double' with LDBL_MANT_DIG == 106 (a.k.a. "double double"),
- 'long double' with LDBL_MANT_DIG == 113 (a.k.a. "quad precision").
That makes for a lot of code and a lot of required testing, and is not
future-proof (regarding new floating-point types).

If possible, in Gnulib, we would prefer a single implementation, even if
it is a bit slow.

I am not sure whether it makes sense to implement a more generic algorithm.  When the mantissa width is known (and the range of denormalized numbers), the algorithms can avoid big integer arithmetic most of the time.  A generic algorithm that does not check special cases would probably use big integers in all code paths and could give no upper bound of memory usage.  If one is willing to include GNU MPFR, a correct generic implementation seems to be already readily available by combining mpfr_strtofr and, say, mpfr_get_d.

If reusing the glibc code is not too complicated so that procedures like (c_)strtoaf, (c_)strtoad, and (c_)strtoald can be provided, it is better than nothing.

> Let me also mention what I feel is an inconvenience of the standard C
> library functions: they mix parsing/composing the strings with the actual
> conversion routines.

That's because the "actual conversion routines" have a big-integer type
as input or output, and standard C does not have these types.

The integer parts could still be represented as strings of decimal digits.

Marc

> Routines that take the sign, the (integer) mantissa,
> and the exponent separately (or return them separately) are more
> fundamental.

Yes, and you find such routines in GMP or possibly in minigmp.

For the "composing" part, Gnulib has such routines in vasnprintf.c,
lines 443..1650.

Bruno

From:	Marc Nieper-Wißkirchen
Subject:	Re: Accurate reading of floating point numbers
Date:	Mon, 26 Aug 2024 16:49:45 +0200