[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Windows PSPPIRE/PSPP is pretty wonky

From: John Darrington
Subject: Re: Windows PSPPIRE/PSPP is pretty wonky
Date: Wed, 31 Dec 2014 10:28:10 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, Dec 30, 2014 at 04:58:48PM -0600, Alan Mead wrote:
     John and Harry,
     PSPPIRE.exe 0.8.4-g5ce6b1 (and the associated PSPP) is pretty wonky. 
     Paste doesn't work correctly (I believe that's a know issue?). To paste
     syntax into the syntax window, I had to right-click and choose paste;
     neither Control-v or Edit > Paste would paste the text.
     And Run > Current line wasn't working (but now I cannot replicate it,
     I'm guessing it has to do with where the cursor was rather than which
     line of the syntax window was "shaded"). What was most annoying is that
     by "not working" I mean that the output window would flash but there was
     no additional output in the window and no warning or error.

I don't know why this should be.  Perhaps Harry can shed some light on it.
     And then look at the boxplot I got when I ran Robin's syntax on the
     physio data (attached "boxplot.png")... I don't use examine and I don't
     know what " /STATISTICS = EXTREME (3)" is meant to do, but I know what a
     boxplot is and there shouldn't be values like 9999999 between 1200 and
     200 on the y-axis.

[ EXTREME (3) reporst the largest and smallest three values of the variable ]
Regarding the 99999 issue, I certainly don't get that on GNU/Linux - my guess 
is that
Windows has rounding issues and is miscalculating 300 as 299.99999999999999 
(the left
hand side is off the page).

Like you say, Windows is somewhat Wonky.  That is one reason why I don't 
regularly use
it.  Note, that we whilst we try to support PSPP under windows (and Harry has 
an excellent job making his binaries available) the recommended platform is GNU 
or GNU/Linux.

     Regarding the actual algorithm, the boxplot I get from SPSS is attached
     as "boxplot2.png".  I think it's a lot more reasonable (albeit uglier).
     The main difference is the SPSS boxplot had short whiskers while PSPP's
     boxplot whiskers seems to include the entire range of the data
     (including the outlier). In the physio dataset, apparently there are
     some outliers like 30 mm for a human height.  That's the kind of thing
     that boxplots are supposed to help you find.  Maybe that's a bug in PSPP
     that the whisker length is just wrong?  Otherwise I think it would make
     more sense to limit the whiskers to some reasonable value like 1.5 times
     the inter-quartile range (or to the highest and lowest values that are
     within 1.5 times the inter-quartile range).

Here is what SPSS has to say about boxplots:

        The boundaries of the box are Tukey's hinges. The length of the box is 
the interquartile range
        based on Tukey's hinges. That is, IQR = Q_3 - Q_1
         STEP = 1.5 IQR
        A case is an outlier if 
        Q_3 + STEP < y < Q_3 + 2 * STEP
        Q_3 - 2 * STEP < y < Q_3 - 2 * STEP

        A case is an extreme if
        y >= Q_3 + 2 * STEP
        y <= Q_1 - 2 * STEP

Note that it doesn't actually say where the whiskers should be.  However it 
seems that PSPP
is placing the lower whisker at the lowest value y, of the dataset for which
 y < Q1 - STEP  
and the upper whisker at the highest value y, for which
 y < Q3 + STEP

I vaguely remember reading this recommendation in the literature.

If someone can reference any better recommendations, when we can consider 
implementing that instead.


PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See or any PGP keyserver for public key.

Attachment: signature.asc
Description: Digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]