pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ppsp and python


From: John Darrington
Subject: Re: ppsp and python
Date: Tue, 23 Oct 2012 08:45:05 +0200
User-agent: Mutt/1.5.20 (2009-06-14)

On Mon, Oct 22, 2012 at 10:40:35AM -0500, Daniel Elliott wrote:
     
     I am not knowledgeable about the "SPSS way" of doing things with
     respect to creating new functions but I figured that,  with neural
     nets for example, I could provide the training parameters available
     and you could tell me what format the input and output data should be.
      From there, writing the interface wouldn't be too painful.
     
     My assumption is that PSPP users are more focused on analyzing results
     from a returned model than they are interested in the minutiae of
     implementation detail.  From this perspective, I think that the best
     way to use an neural network from PSPP would be k-fold
     cross-correlation or bootstrap cross-validation which are described in
     chapter 6 of Empirical Methods for Artificial Intelligence by Paul
     Cohen.  This would shield the user from as many of the issues in model
     selection as possible.  It would be good if the users could specify
     stuff like the number of layers and the number of nodes in each layer
     and the type of activation functions to use or some subset of these
     items.  Sadly, the approach to machine learning algorithms is pretty
     undisciplined.

There are several goals for PSPP.  One is to provide a free replacement for
SPSS. Just as libreOffice does for MSWord and Gnumeric does for Excel. A
significant proportion of our users are students doing undergraduate stats
courses.  These users need a) a user interface which resembles that of SPSS,
and b) results which resemble those of SPSS, both in terms of presentation
and values.

Now SPSS, has several NN options. For example there is an MLP command.  If we
were to implement a MLP command, the user interface should therefore resemble 
that of SPSS, although the implementation need not.  Alternatively, one could
provide a PSPP "extension" which does not claim to be SPSS compatible, so long
as that is clear in the documentation.

A second class of users, are professional statisticians, who process HUGE 
amounts of data - datasets with hundreds of millions of observations.  The
routines used in PSPP go to great pains to cope with such datasets.   I 
mention this, because it can sometimes be a non-trivial task to convert an 
existing routine to do that, especially if the implementation dynamically 
allocates memory to store its data.

     Again, I am very far from being a competent statistician, but would
     enjoy the opportunity to provide some tools to PSPP.  My abilities are
     primarily in things like logistic regression, mixtures of Gaussians,
     PCA, and neural networks for classification and prediction.  I also do
     reinforcement learning but I doubt that is of any use.

PCA is already supported.  See the FACTOR command.  We also have k-means
clustering.  Coincidentally, logistic regression I am already working on
and hope to complete very shortly.  We don't yet have any neural net routines,
nor do we have hierachial clustering.  So we could certainly use some
contributions there.

I suggest you have a look at how some of the existing algorithms are 
implemented,
and perhaps post some code to show how you think your contributions could
fit.

Thanks for your interest.

Regards

John

     
-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://keys.gnupg.net or any PGP keyserver for public key.

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]