[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Mon, 16 Apr 2007 11:10:43 +0800
If we were to follow approach 2, am I right in thinking that the
'interaction' data structure could be as large as the number of
cases in the casefile? If so, I think that approach 2. would become
unworkable for very large casefiles.
On the other hand, approach 1 sounds attractive, but there are things
that need to be considered:
a) They'd have to be a special class of variable, which would not
normally be displayed, written to system files etc. So a new
enum dict_class entry in variable.h would be required.
b) I'm not sure how existing code would deal with these
'invisible' variables. For example many procedures might iterate
through all the variables. So dict_get_var_cnt might have to
take a parameter so that we'd know if we were interested in
'interaction' variables or not.
c) Presumably it's not just the dictionary that needs modifying.
When you add new interaction, you also need to add values for the
variables into the casefile? That involves running a procedure.
What I did for RANK was to create a temporary variable, which was an
illegal name in pspp syntax, and delete it afterwards.
On Sun, Apr 15, 2007 at 03:06:17PM -0400, Jason Stover wrote:
To have a glm procedure, pspp needs a data structure to handle
interactions. An interaction can be thought of as another variable
which is a function of two or more variables, usually categorical,
Variable 1 Variable 2 Interaction
A B AB
E B EB
A C AC
E C EC
...etc. The interaction term could be created in one of two ways:
Either 1) create a new variable in the dictionary that corresponds to
the interaction, or 2) create a new 'interaction' data structure
that contains all necessary mappings between existing variables and
the value of the interaction.
Approach 1 would add a variable to the dictionary, but would not
create any more observations in the data set. It would make coding any
procedures that use interactions easier than approach 2, because doing
so would mean the procedure doesn't need to know about much special
code to handle interactions. It would also prevent the need for having
any more obscure string-values-to-binary-vector code like that in
category.[ch]. Approach 1 would still require the creation of some
code to create the interaction, though it may not require the creation
of a specialized "interaction" data structure to be available for use
by all procedures.
Approach 2 doesn't require adding anything to the dictionary, but it
does mean that any procedures that need to use interactions would have
to create those interactions themselves. These interactions would
therefore be lost after the procedure exits, meaning that any other
procedure that needs interactions would have to recreate
them. Approach 2 also means writing more code that partly duplicates
the code already in category.[ch].
I favor approach number 1, but before I fiddle with the
dictionary, I thought I should ask.
pspp-dev mailing list
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.
Description: Digital signature
- interactions, Jason Stover, 2007/04/15
- Re: interactions,
John Darrington <=