pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regression and glm with big data


From: John Darrington
Subject: Re: regression and glm with big data
Date: Wed, 15 Aug 2007 13:10:42 +0800
User-agent: Mutt/1.5.13 (2006-08-11)

There's a similar problem with percentiles (used by frequencies and
examine).

I suggest we leave these until after the release.

J'

On Tue, Aug 14, 2007 at 10:45:58PM -0400, Jason Stover wrote:
     Right now, linreg.c, regression.q and glm.q won't handle large data
     sets very well. The problem is that the regression and (currently
     fetal) glm procedure store the entire data set in memory, then pass
     the data to pspp_linreg () which finds the least squares estimates.
     
     Storing the entire data set in memory isn't necessary, just easier to
     code. PSPP could handle much bigger data sets if, in the
     casereader_read loop, it computed two matrix products from the data in
     a single pass, then sent that, much smaller, information to
     pspp_linreg().
     
     But there may be tasks for which pspp_linreg () should accept all the
     data as a single matrix, so it should probably be able to do that,
     too.
     
     My question is: Should I do this now, or wait until after the release?
     It will probably change a lot of code in linreg.c, and could introduce
     several bugs. The benefit would be to make any procedure that needs
     regression able to run with very large data sets.
     
     -Jason
     
     
     _______________________________________________
     pspp-dev mailing list
     address@hidden
     http://lists.gnu.org/mailman/listinfo/pspp-dev

-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.


Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]