pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: t-test gets NaN instead of 0 for significance on x86-64


From: John Darrington
Subject: Re: t-test gets NaN instead of 0 for significance on x86-64
Date: Thu, 3 Sep 2009 02:14:05 +0800
User-agent: Mutt/1.5.18 (2008-05-17)

On Tue, Sep 01, 2009 at 09:15:49PM -0700, Ben Pfaff wrote:
     Some time ago, Matej Cepl <address@hidden> reported that
     t-test-alpha3.sh in 0.6.2-pre5 failed on x86-64 with GCC 4.4:
     
     > PASS: tests/bugs/t-test-alpha3.sh
     > 19c19
     > < #Pair 0|A & B#3|      1.000| NaN#
     > ---
     >> #Pair 0|A & B#3|      1.000|.000#
     > compare output
     > FAILED
     > FAIL: tests/bugs/t-test-paired.sh
     
     This evening, I've run the same test on an x86-64 machine
     (bellini.debian.org) with GCC 4.3.2 and I reproduce this test
     failure.  I also get the same failure with 0.6.1, although I have
     to run the test by hand there since this test was new in
     0.6.2-pre5.
     
     After some fussing, I tracked the source of the NaN to this
     calculation in pscbox() in src/language/stats/t-test.q:
     
           double correlation_t =
             pairs[i].correlation * sqrt (df) /
             sqrt (1 - pow2 (pairs[i].correlation));
     
     In this particular test case, pairs[i].correlation is almost
     exactly 1.0, such that 1 - pow2 (pairs[i].correlation) comes out
     just slightly negative, making the square root yield NaN.
     
     John, do you have a suggestion for the correct fix?  I don't know
     enough about the math here to say.

So the cause of the problem is that correlation^2 has a value greater than 
unity.
This of course it not mathematically possible, because correlation is defined 
to 
lie in the range [-1, +1]. So it must be because of numerical instability 
in the calculation of correlation.  This is not particularly surprising, 
because 
the correlation is calculated according to the classical one pass algorithm,
which as we've discussed before is somewhat unstable.


In the long term, I think that all linear correlation should be calculated 
using 
a common routine. For example, using src/math/covariance-matrix.c (which is
currently also unstable, but at least the instability would be in one place).

As a short term solution, the best I can suggest is that we 
clamp pow2(correlation) to 1.0.

J'

-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.


Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]