lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] MinGW-w64 anomaly?


From: Vadim Zeitlin
Subject: Re: [lmi] MinGW-w64 anomaly?
Date: Thu, 22 Dec 2016 00:43:29 +0100

On Wed, 21 Dec 2016 22:49:50 +0000 Greg Chicares <address@hidden> wrote:

GC> > I think a better way to implement fenv_validate() would be by performing
GC> > some computation(s) with known good answer(s) and comparing that their
GC> > results match the expected ones.
GC> 
GC> If fenv_t (and therefore std::fegetenv()) always reliably includes
GC> all the contents of the x87 control word

 I'm not sure if we can count on this but I could check.

GC> Testing some actual computation or computations instead seems like
GC> a strange and roundabout way of performing the same test.

 Really? It seems like much more direct way to me: after all, we're not
interested in preserving x87 control word, we just want to have the correct
results. And for the code compiled to use SSE instead of x87 instructions
these are not at all the same thing, which is the source of the problem. If
we checked the computation results directly we wouldn't have this problem
in the first place and if we use SSE-specific instruction instead of doing
it now, we'll just have the same problem again when porting to ARM or even
the next generation of Intel CPUs which provides some better way of doing
floating point arithmetic (128 bit doubles in hardware?).

GC> Granted, it has a certain charm--kind of like autoconf in that it
GC> probes the actual capabilities.

 Exactly!

GC> But designing this would be tricky, and proving it to be equivalent to
GC> testing the x87 code word would be a challenge. I don't think it's
GC> worth the effort.

 How much effort would it really take? Testing the rounding mode is
trivial, AFAICS we just need 2 tests using std::rint() for -0.5 and 0.5 to
ensure that FE_TONEAREST is in effect. And for the precision, we could just
find something that would be 0 with 64 bit doubles but non-0 with 80 bits.


GC> > GC> Okay, so if I revert that, conditional on LMI_X86, that will undo
GC> > GC> any damage? Or should I use something like
GC> > GC>   #if defined __SSE__ || (defined _M_IX86_FP && _M_IX86_FP)
GC> > GC> instead? Or does C++11 offer a standard way of doing this?
GC> > 
GC> >  There is no standard way of checking for this to the best of my knowledge
GC> > (it would be really surprising if there were, seeing how this is entirely
GC> > architecture-specific) and so you would indeed need to test for __SSE__ 
for
GC> > gcc and, if you're so inclined, for _M_IX86_FP > 0 for MSVC (where this is
GC> > more complicated because this one is only defined if _M_IX86 is defined).
GC> 
GC> Okay, I believe that's what the snippet above does.

 Oops, sorry, I've somehow glossed over the second part of it. However I
still don't think it's quite correct as the condition above is false for 64
bit platforms (_M_X64 defined, but not _M_IX86) or, in fact, any other
architecture. So I'd write it as

        defined _MSC_VER && (!defined _M_IX86 || _M_IX86_FP)


GC> Because the C99 committee chose to omit the precision bits. I'd say
GC> their mistake is somewhere between ghastly and unconscionable, but
GC> I'm trying to suppress my emotions.

 I think it's understandable that they didn't want to standardize
functionality available for only a single process on the market (or are
there any other ones like it?), even if it's the most dominant one.


GC> Because if two users run lmi with the same input, we want them to
GC> get the same output. But msvc decided to poison the control word
GC> even for programs that do no floating point calculations, as part
GC> of the startup code for any 'exe' and the initialization code for
GC> any 'dll', and they decided not to virtualize the x87 control
GC> word across task switches,

 I like blaming Microsoft as much as any other person, but I don't think
the last part is correct for any version of MSW from this millennium.

GC> so any program started or any dll initialized while lmi is running
GC> could change lmi's results: that's why this code exists.

 And, of course, it's impossible for any compiler/CRT to prevent a DLL
loaded into the process from changing any process-wide parameters, such as
x87 (or, indeed, SSE) control word. But I think badly behaved shell
extensions doing this are much more rare nowadays than before. Out of
curiosity, when was the last time you received a report about the floating
point environment failure from one of lmi users?


GC> Then we do this once, in 'config.hpp':
GC> 
GC>   #if defined __SSE__ || (defined _M_IX86_FP && _M_IX86_FP)
GC>   #   define LMI_SSE
GC>   #endif //  defined __SSE__ || (defined _M_IX86_FP && _M_IX86_FP)
GC> 
GC> and wherever we now use x87-specific instructions, we conditionalize
GC> them like this:
GC> 
GC>   #if !defined LMI_SSE
GC>       asm volatile("fstcw %0" : : "m" (control_word));
GC>   #else // SSE
GC>       std::fe...something();
GC>   #endif

 I don't like explicitly testing for SSE. I really, really don't understand
why do you consciously lock yourself into the choice between x87 and SSE
instead of making a much more natural choice between x87 and
standard-conforming implementation.

 I propose to have LMI_USE_X87 which would be set like this:

        #if defined __GNUC__
            #if defined LMI_X86 && !defined __SSE__
                #define LMI_USE_X87
            #endif
        #elif defined _MSC_VER
            #if defined _M_IX86 && _M_IX86_FP == 0
                #define LMI_USE_X87
            #endif
        #endif

and then use it in the following way

        #ifdef LMI_USE_X87
            asm volatile("fxxx");
        #else
            std::fexxx();
        #endif

GC> >  To be precise, I suggest:
GC> > 
GC> > 1. Add an alternative implementation of all fenv_xxx() functions except 
for
GC> >    fenv_precision() for which this is impossible, but which is not really
GC> >    used anywhere anyhow, using only the standard functions, with a
GC> 
GC> It is used for the crucial purposes of setting its initial value and
GC> maintaining that setting as an invariant.

 Yes, but only as part of fenv_initialize() and fenv_validate(). These
functions would continue to use it when LMI_USE_X87 is defined, but they
would need to be either check only fenv_t (which should, of course, be
sufficient for any non-x87 builds in practice as fenv_t contains both the
rounding and error handling information and precision can't be changed for
them anyhow) or implement my suggestion of checking that actual calculation
results match the expected ones.

GC> Now that we've arrived at this point, let's step back and reconsider
GC> what we're trying to accomplish, and how best to accomplish it. We
GC> see that C99's fenv section crucially lacks precision control, which
GC> is a prerequisite for making lmi results reproducible, which is an
GC> imperative. I think we can also conclude that we already have a full
GC> replacement of C99's fenv for x86/x87, which could be extended to
GC> include x86_64 as well in either of two ways:
GC> 
GC> (A) use C99's fenv for x86_64, and lmi's for x86
GC> 
GC> (B) extend lmi's implementation, e.g.:
GC> 
GC> + #if defined LMI_X86
GC>       asm volatile("fstcw %0" : : "m" (control_word));
GC> + #else if defined LMI_SSE
GC> +     asm volatile("ldmxcsr ... ...
GC> + #else // !defined LMI_X86 && !defined LMI_SSE
GC> + #   error Unknown platform
GC> + #endif // !defined LMI_X86 && !defined LMI_SSE
GC> 
GC> and likewise for this:
GC>     asm volatile("fldcw %0" : : "m" (control_word));
GC> 
GC> (A) might work automatically on ARM CPUs, though I'm not sure we
GC> need to care about that. Otherwise, (B) seems much simpler than (A).

 Sorry, but I don't follow at all. Since when is writing inline assembly
simpler than using a standard function!? It's obviously more difficult (gcc
asm statement has its own DSL that has nothing to do with the standard C++
and must be learnt separately), less readable (are all C++ programmers
supposed to know assembly, including rarely used instructions such as
those, now?) and not portable, neither between architectures nor compilers
(gcc asm obviously doesn't work for MSVC nor probably any other compiler
with the exception of clang). I really see no advantage whatsoever to
resorting to inline assembly here, but tons of disadvantages.


GC> >  If you agree with the above, could you please tell me which tests should
GC> > be used for the checks in (2)?
GC> 
GC> If we choose (B), then I don't think that question even arises.

 Sorry again but how so? The question is completely orthogonal. It would be
very useful to ensure that nothing got broken after doing any non-trivial
changes to the code and we still want to compare performance of x87 and SSE
builds regardless of whether we choose (A) or (B). Let me reproduce the
"checks in (2)" for reference here:

GC> > 2. Compare the results and performance of the build using the standard
GC> >    functions (but still using x87 instructions!) with the current version.


 I sincerely hope we're not going to choose the (B) route. It might solve
the problem of failing tests in 64 bit builds, but it will make the code
even more complex and less standard-conforming and portable than it is now
which is certainly exactly the opposite of my intentions.

 Regards,
VZ


reply via email to

[Prev in Thread] Current Thread [Next in Thread]