pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: interactions


From: John Darrington
Subject: Re: interactions
Date: Mon, 16 Apr 2007 11:10:43 +0800
User-agent: Mutt/1.5.9i

If we were to follow approach 2, am I right in thinking that the 
'interaction' data structure could be as large as the number of 
cases in the casefile?  If so, I think that approach 2. would become
unworkable for very large casefiles.

On the other hand, approach 1 sounds attractive, but there are things
that need to be considered:

 a) They'd have to be a special class of variable, which would not
 normally be displayed, written to system files etc.  So a new 
 enum dict_class  entry in variable.h  would be required. 

 b) I'm not sure how  existing code would deal with these
 'invisible' variables.  For example many procedures might iterate 
  through all the variables.  So dict_get_var_cnt might have to
  take a parameter so that we'd know if we were interested in
 'interaction' variables or not.

 c) Presumably it's not just the dictionary that needs modifying.
 When you add new interaction, you also need to add values for the
 variables into the casefile?  That involves running a procedure.  
 What I did for RANK was to create a temporary variable, which was an
 illegal name in pspp syntax, and delete it afterwards.
 
J'


On Sun, Apr 15, 2007 at 03:06:17PM -0400, Jason Stover wrote:
     To have a glm procedure, pspp needs a data structure to handle
     interactions. An interaction can be thought of as another variable
     which is a function of two or more variables, usually categorical,
     like this:
     
          Variable 1            Variable 2        Interaction
              A                    B                 AB
         E                 B                 EB
         A                 C                 AC
         E                 C                 EC
         
     
     ...etc. The interaction term could be created in one of two ways:
     Either 1) create a new variable in the dictionary that corresponds to
     the interaction, or 2) create a new 'interaction' data structure
     that contains all necessary mappings between existing variables and
     the value of the interaction.
     
     Approach 1 would add a variable to the dictionary, but would not
     create any more observations in the data set. It would make coding any
     procedures that use interactions easier than approach 2, because doing
     so would mean the procedure doesn't need to know about much special
     code to handle interactions. It would also prevent the need for having
     any more obscure string-values-to-binary-vector code like that in
     category.[ch]. Approach 1 would still require the creation of some
     code to create the interaction, though it may not require the creation
     of a specialized "interaction" data structure to be available for use
     by all procedures.
     
     Approach 2 doesn't require adding anything to the dictionary, but it
     does mean that any procedures that need to use interactions would have
     to create those interactions themselves. These interactions would
     therefore be lost after the procedure exits, meaning that any other
     procedure that needs interactions would have to recreate
     them. Approach 2 also means writing more code that partly duplicates
     the code already in category.[ch].
     
     I favor approach number 1, but before I fiddle with the
     dictionary, I thought I should ask.
     
     -Jason
     
     
     _______________________________________________
     pspp-dev mailing list
     address@hidden
     http://lists.gnu.org/mailman/listinfo/pspp-dev

-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.


Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]