|
From: | John W. Eaton |
Subject: | Re: floating point precision control |
Date: | Thu, 14 Jul 2016 17:12:43 -0400 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.5.0 |
On 07/14/2016 10:03 AM, siko1056 wrote:
Mike Miller wrote[...] should we just enforce double extended (long double) precision on all x86 systems all the time [...]I agree with Michael Godfrey and jwe. Looking at Matlab, a "double" is clearly specified as IEEE-754 binary64 and "single" as binary32 [1]. Octave does not explicitly refer to IEEE-754 (maybe we should do so!), but speaks of "double precision" and not "extended precision" (80 bits).
As far as I know, the only place where we attempt to specifically use floating point types wider than 64-bits is in the code that deals with operations on 64-bit integers. And the only reason to do that is so that we can properly support the (weird) saturation semantics required for Matlab compatibility.
(A side note: If I remember correctly, providing arithmetic operations on int64 types was supported in Octave before it was supported in Matlab.)
For actual floating point operations, we don't explicitly use extended precision (by that, I mean try to declare variables with long double or similar types). So I think the question is whether we should allow any computations for floating point type to use extended precision internally.
I think the problem is and was, that Octave uses "long double" for mixed computations, as introduced 2008 by Jaroslav Hajek [4]. "long double" is not really standardized [5] and requires checking for 10 or 16 bytes to produce reliable results. Wasn't it more appropriate to cast both parameters to standardized (u)int64_t, ensuring our slightly different integer arithmetic by C++?
I haven't looked at that code in detail in a long time. If you have some idea of a better way to handle operations for int64 types in a way that is compatible with Matlab's saturation semantics for integer operations, then please propose a patch.
Another burden I see for the compiler is, that Octave might avoid the compiler to use upcoming instructions-sets like AVX [6], that only regard packing binary64, and do all our library dependencies (LAPACK, SuiteSparse, ...) support our desire for extended-precision, too?
I don't think we need to worry about that for what we are trying to use extended precision doubles for.
jwe
[Prev in Thread] | Current Thread | [Next in Thread] |