[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lmi] rounding unit test under Linux/x86-64 (was: trivial patch to fix f
[lmi] rounding unit test under Linux/x86-64 (was: trivial patch to fix fenv test compilation under Linux)
Thu, 25 Mar 2010 22:21:19 +0100
On Wed, 24 Mar 2010 11:39:39 +0000 Greg Chicares <address@hidden> wrote:
GC> > all tests compile under Linux again now. Unfortunately they still
GC> > don't pass, even this one fails with
GC> > **** test failed: '-3' == '-2'
GC> > [file fenv_lmi_test.cpp, line 211]
GC> > **** test failed: '1' == '2'
GC> > [file fenv_lmi_test.cpp, line 213]
GC> > (note that the line numbers are offset by the above patch compared to the
GC> > svn version) but I ran out of time to investigate it today.
GC> Here's the applicable part of the code:
GC> fenv_rounding (fe_downward);
GC> BOOST_TEST_EQUAL(fe_downward , fenv_rounding());
GC> #if defined LMI_COMPILER_PROVIDES_RINT
GC> BOOST_TEST_EQUAL(-3, rint(-2.5));
GC> BOOST_TEST_EQUAL(-2, rint(-1.5));
GC> BOOST_TEST_EQUAL( 1, rint( 1.5));
GC> BOOST_TEST_EQUAL( 2, rint( 2.5));
GC> #endif // defined LMI_COMPILER_PROVIDES_RINT
GC> and it's interesting that two of the tests pass but two fail.
GC> The failures seem to be here:
GC> BOOST_TEST_EQUAL(-3, rint(-2.5)); // '-2' observed
GC> BOOST_TEST_EQUAL( 1, rint( 1.5)); // '2' observed
GC> and that pattern seems to suggest a round-to-nearest mode,
GC> so I'd suspect a problem in fenv_rounding() because it doesn't
GC> seem possible that glibc implements rint() incorrectly for
GC> such obvious testcases...unless they're not trying to be
GC> compatible with C99?
Actually the problem is not in fenv_rounding() but is due to the fact that
the FPU control word is not being used at all. Under Linux/x86-64 platform
math operations are implemented using SSE2 as it's faster and always
available for all 64 bit CPUs, quoting -mfpmath gcc option documentation
Generate floating point arithmetics for selected unit unit. The
choices for unit are:
Use the standard 387 floating point coprocessor present
majority of chips and emulated otherwise.
This is the default choice for i386 compiler.
Use scalar floating point instructions present in the SSE
The resulting code should be considerably faster in the
majority of cases and avoid the numerical instability
problems of 387 code, but may break some existing code that
expects temporaries to be 80bit.
This is the default choice for the x86-64 compiler.
But in fact this is mostly what finally allowed me to find the problem (I
looked for a long time in wrong places...) and is probably irrelevant to
what really goes on as rint() implementation in GNU libc is not compiled by
gcc anyhow but written directly in assembler. I'm not sure about where
exactly is the implementation of rint() found in glibc sources (I tried to
find it there but got lost...) but looking at the disassembly of rint() on
my system in gdb I do see SSE instructions being used and not the x87 ones.
And, quoting from "Intel® 64 and IA-32 Architectures Software Developer’s
Manual, Volume 1" (http://developer.intel.com/Assets/PDF/manual/253665.pdf)
188.8.131.52 Rounding Control (RC) Fields
In the Intel 64 and IA-32 architectures, the rounding mode is
controlled by a 2-bit rounding-control (RC) field (Table 4-8 shows
the encoding of this field). The RC field is implemented in two
* x87 FPU control register (bits 10 and 11)
* The MXCSR register (bits 13 and 14)
Although these two RC fields perform the same function, they
control rounding for different execution environments within the
processor. The RC field in the x87 FPU control register controls
rounding for computations performed with the x87 FPU instructions;
the RC field in the MXCSR register controls rounding for SIMD
floating point computations performed with the SSE/SSE2
So SSE instructions are completely unaffected by the changes to the FPU
control word. Setting MXCSR correct does make the test work (I tested this
manually under gdb) and looking at sysdeps/x86_64/fpu/fesetround.c I see
that it changes both places at once, just as I'd expect:
fesetround (int round)
unsigned short int cw;
if ((round & ~0xc00) != 0)
/* ROUND is no valid rounding mode. */
/* First set the x87 FPU. */
asm ("fnstcw %0" : "=m" (*&cw));
cw &= ~0xc00;
cw |= round;
asm ("fldcw %0" : : "m" (*&cw));
/* And now the MSCSR register for SSE, the precision is at different
positions in the different units, we need to shift it 3 bits. */
asm ("stmxcsr %0" : "=m" (*&mxcsr));
mxcsr &= ~ 0x6000;
mxcsr |= round << 3;
asm ("ldmxcsr %0" : : "m" (*&mxcsr));
Of course, we could do the same ourselves in LMI-specific code. But I
really think that if we're interested in fixing this at all, we should just
use fesetround() i.e. fix the
// SOMEDAY !! Revisit suppressed __STDC_IEC_559__ support. See:
comment from fenv_lmi.hpp.
For me a more interesting question is whether we actually want to fix it.
AFAIK currently LMI is neither used nor compiled under 64 bit platforms at
all so it's probably not a priority. OTOH it's not totally impossible that
one day you may wish to enabled SSE support even in 32 bit builds, i.e. use
-mfpmath=sse2 option, for performance reasons and it would be a nasty
surprise to (re)discover this breakage then. And unfortunately I don't know
of any way to test, during compilation, if SSE instructions are being used
so we can't even generate a #error or something.
Please let me know if you'd like me to enable LMI_IEC_559 and retest with