pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

How competent are the PSPP developers?


From: John Darrington
Subject: How competent are the PSPP developers?
Date: Fri, 12 Nov 2004 11:47:59 +0800
User-agent: Mutt/1.3.28i

I thought it was about time somebody started using PSPP for something real.

The obvious thing was using it to determine the level of effort and competence
of the PSPP developers.  I got a history from cvs using:

  cvs history -c -a > cvs-hist


Then I wrote a few lines of PSPP syntax:


DATA LIST FIXED 
        FILE='cvs-hist'
        /RTYPE 1-1 (A)
        WHEND 3-12 (SDATE)
        WHENT 14-18 (TIME)
        WHO 26-33 (A)
        WHAT 40-70 (A)
        .

VALUE LABELS /RTYPE 'A' 'Added' 'M' 'Modified' 'R' 'Removed'.
VARIABLE LABEL WHAT 'Filename'.
VARIABLE LABEL WHO 'Developer'.
VARIABLE LABEL WHEND 'Date'.


First, I wanted to know who had been most active, so I did

SPLIT FILE BY WHO.
FREQUENCIES /RTYPE.
SPLIT FILE OFF.


Clearly, blp is the most active developer, with jmd not far behind.
However a major part of jmd's effort simply involves deleting files.
He's deleted more files than he's added, so probably his total contribution
is negative!!


But this doesn't indicate how competent the respective developers are.
So I devised a statistic to measure it as follows:

I took the difference between sucessive modifications to a file, noting
the developer who did the prior modification.  My reasoning is, that if the
developer did it right in the first place, then that file won't have to be 
modified for a long time.   Thus, the most competent developers will have a 
very long time between sucessive commits on their files.


COMPUTE T= XDATE.JDAY(WHEND).

COMPUTE WHAT=RTRIM(WHAT).


SORT CASES BY WHAT, T (D).
SPLIT FILE BY WHAT.

* Number of days between file modification
COMPUTE DIFF = LAG(T) - T.
COMPUTE DIFF = DIFF / 3600 / 24.

VARIABLE LABEL DIFF 'Time between modification'.

LIST.


So DIFF gives me a variable which is the number of days between sucessive
modifications on a file. 

Like a good statistician, I want to make sure it's normally distributed, so
I do: 

EXAMINE DIFF 
        /STATISTICS = DESCRIPTIVES
        /PLOT = NPPLOT
        .

It's clearly not normal, but like a good politician I'll ignore the statistics 
when it suits me to do so, and only publish the ones that are favourable to my
agenda.


EXAMINE DIFF BY WHO
        /STATISTICS = DESCRIPTIVES
        /NOTOTAL
        .


Well of the 4 developers, the most competent is pjk, whose work has to be 
re-done on average 58 days later.   Mkiefte comes in second with a score of 
32 days. Blp's work needs attention 28 days later, and jmd is the most 
incompetent developer.  His work needs fixing 26 days later.


Now I want to know if these differences are significant.  I use the ONEWAY 
command to do this.

ONEWAY DIFF BY WHO
        /STATISTICS = HOMOGENEITY
        /CONTRASTS = -3, 1, 1, 1
        /CONTRASTS = 0, 1, -1, 0
        .


I ran two planned contrast tests.  The first to test if mkiefte's result is 
significantly different from the others (since I noticed it's somewhat larger).
The second to show if the there is significant differences between the two 
most active developers' competence  (jmd and blp).

The overall result is not significant at the 0.05 level, so we can say that in 
general, all developers are equally (in)competent.

Now for the contrasts. The homogeneity of variance test is not significant, so 
we use the `Assume equal variances' results:

For test 1, there is significant contrast at 0.05, so mkeifte is significantly 
more competent than the other developers.

Test 2 is not significant, so blp and jmd do not seem to be any more competent 
than each other.

*****************************************************************************

Anyway, this exercise uncovered a few bugs in PSPP, some of which I've plugged.
The others are:

1.  In the first FREQUENCIES table, only the "Removed" label is displayed.
    For some reason the "Modified" and "Added" labels are not displayed.

2.  I should be able to replace the lines:
    
    SPLIT FILE BY WHO.
    FREQUENCIES /RTYPE.
    SPLIT FILE OFF.

    with

    TEMPORARY.
    SPLIT FILE BY WHO.
    FREQUENCIES /RTYPE.

    When I tried it, it segfaulted.

3.  The LIST command produces 50 pages of rather uninteresting numbers,
    so I commented it out.  Strangely, when I do this, all the following 
    commands are ignored.

4.  The XDATE.JDAY function doesn't seem to behave as the manual explains it.
    The manual says it gives a number between 1 and 366 indicating the number 
    of days from the start of the year.  In fact it seems to give the number of
    seconds since some arbitrary epoch (which happened to be what I wanted in
    this instance).


Full text of the program below:

TITLE 'Level of Developer Contribution to PSPP'

* cvs-hist generated by cvs history -c -a

DATA LIST FIXED 
        FILE='cvs-hist'
        /RTYPE 1-1 (A)
        WHEND 3-12 (SDATE)
        WHENT 14-18 (TIME)
        WHO 26-33 (A)
        WHAT 40-70 (A)
        .

VALUE LABELS /RTYPE 'A' 'Added' 'M' 'Modified' 'R' 'Removed'.
VARIABLE LABEL WHAT 'filename'.
VARIABLE LABEL WHO 'Developer'.
VARIABLE LABEL WHEND 'Date'.

SPLIT FILE BY WHO.
FREQUENCIES /RTYPE.
SPLIT FILE OFF.


COMPUTE T= XDATE.JDAY(WHEND).

COMPUTE WHAT=RTRIM(WHAT).


SORT CASES BY WHAT, T (D).
SPLIT FILE BY WHAT.

* Number of days between file modification
COMPUTE DIFF = LAG(T) -T.
COMPUTE DIFF = DIFF / 3600 / 24.

VARIABLE LABEL DIFF 'Time between modification'.

LIST.

SPLIT FILE OFF.

SELECT IF (DIFF > 0).

EXAMINE DIFF 
        /STATISTICS = DESCRIPTIVES
        /PLOT = NPPLOT
        .

EXAMINE DIFF BY WHO
        /STATISTICS = DESCRIPTIVES
        /NOTOTAL
        .

ONEWAY DIFF BY WHO
        /STATISTICS = HOMOGENEITY
        /CONTRASTS = -3, 1, 1, 1
        /CONTRASTS = 0, 1, -1, 0
        .

EXECUTE.

-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://wwwkeys.pgp.net or any PGP keyserver for public key.


Attachment: pgp2HUDc2O4mr.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]