pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: musings on performance


From: Jason Stover
Subject: Re: musings on performance
Date: Tue, 9 May 2006 17:36:16 -0400
User-agent: Mutt/1.5.10i

On Tue, May 09, 2006 at 06:58:12AM -0700, Ben Pfaff wrote:
> > Also, there are opportunities to cache things that procedures use.
> > Eg: most parametric procedures make use of the data's covariance
> > matrix.  If we can let that persist between procedures, that will
> > avoid a lot of calculations being repeated ;  just so long as we
> > invalidate that cache when appropriate.
> 
> Yes, I forgot to put that in my list.  It's probably parallel to
> item #2.

This is something I want to take up soon. I have a rough plan below.
Please let me know how this sounds. SPSS can now do little of what I
suggest below. (But what I'm suggesting would make PSPP a good
model-building tool.)

I would like to make PSPP able to:

1. Save models for later use within PSPP. 'Later uses' include
combining them into other models, and assessing by comparing many
models, mostly by checking their performance on 'scratch' data.
'Later uses' might also include fitting other models that could use
some of the sufficient statistics (like sample means and covariance
matrices). Saving models would not take much work if I can use the pool
allocator to do so.

2. Export models in some external formats so they can be used by another
program later. The first format I was thinking of was compilable C. I
suppose other formats like XML ought to be supported too, since SPSS
can export some models as XML. Right now, REGRESSION has some ugly
functions that let it write little C programs. I'd like to clean that
code up and move it to a place where other procedures could use it.

To learn how to do numbers 1 and 2, I should write a modeling procedure
that fits a model quite different from that fit by REGRESSION, but one
whose purpose is, like regression, to find a function f(input) that
predicts some output. I was thinking of a neural network. Another
possibility is a regression tree. I don't want this next procedure to
resemble linear regression too closely, lest I inadvertently write
model-shuffling procedures closely tailored to manipulation of one
particular type of model.

Saving models requires a standard syntax, usable by any procedure that
fits a model to data, that tells PSPP to save that model. I think the
SAVE subcommand is a good candidate, as in this possibility:

    REGRESSION /variables y x1 x2 /DEPENDENT y /SAVE model=m1

...but maybe something else would be better?

And to combine, assess and export models, PSPP would need at least
one additional procedure that takes saved models and does something
with them. What should we call such a procedure? ('MODEL'?)

-Jason




reply via email to

[Prev in Thread] Current Thread [Next in Thread]