octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: floating point precision control


From: John W. Eaton
Subject: Re: floating point precision control
Date: Thu, 14 Jul 2016 17:12:43 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.5.0

On 07/14/2016 10:03 AM, siko1056 wrote:
Mike Miller wrote
[...] should we just enforce double extended (long double) precision on
all x86 systems all the time [...]

I agree with Michael Godfrey and jwe. Looking at Matlab, a "double" is
clearly specified as IEEE-754 binary64 and "single" as binary32 [1]. Octave
does not explicitly refer to IEEE-754 (maybe we should do so!), but speaks
of "double precision" and not "extended precision" (80 bits).

As far as I know, the only place where we attempt to specifically use floating point types wider than 64-bits is in the code that deals with operations on 64-bit integers. And the only reason to do that is so that we can properly support the (weird) saturation semantics required for Matlab compatibility.

(A side note: If I remember correctly, providing arithmetic operations on int64 types was supported in Octave before it was supported in Matlab.)

For actual floating point operations, we don't explicitly use extended precision (by that, I mean try to declare variables with long double or similar types). So I think the question is whether we should allow any computations for floating point type to use extended precision internally.

I think the problem is and was, that Octave uses "long double" for mixed
computations, as introduced 2008 by Jaroslav Hajek [4]. "long double" is not
really standardized [5] and requires checking for 10 or 16 bytes to produce
reliable results. Wasn't it more appropriate to cast both parameters to
standardized (u)int64_t, ensuring our slightly different integer arithmetic
by C++?

I haven't looked at that code in detail in a long time. If you have some idea of a better way to handle operations for int64 types in a way that is compatible with Matlab's saturation semantics for integer operations, then please propose a patch.

Another burden I see for the compiler is, that Octave might avoid the
compiler to use upcoming instructions-sets like AVX [6], that only regard
packing binary64, and do all our library dependencies (LAPACK, SuiteSparse,
...) support our desire for extended-precision, too?

I don't think we need to worry about that for what we are trying to use extended precision doubles for.

jwe




reply via email to

[Prev in Thread] Current Thread [Next in Thread]