pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: interactions


From: Jason Stover
Subject: Re: interactions
Date: Mon, 16 Apr 2007 13:53:54 -0400
User-agent: Mutt/1.5.10i

On Mon, Apr 16, 2007 at 11:10:43AM +0800, John Darrington wrote:
> If we were to follow approach 2, am I right in thinking that the 
> 'interaction' data structure could be as large as the number of 
> cases in the casefile?

No. It would have either a hash of possible values (all unique), or a
small function to get back and forth between a union value and a
binary vector.

> On the other hand, approach 1 sounds attractive, but there are things
> that need to be considered:
> 
>  a) They'd have to be a special class of variable, which would not
>  normally be displayed, written to system files etc.  So a new 
>  enum dict_class  entry in variable.h  would be required. 
> 
>  b) I'm not sure how  existing code would deal with these
>  'invisible' variables.  For example many procedures might iterate 
>   through all the variables.  So dict_get_var_cnt might have to
>   take a parameter so that we'd know if we were interested in
>  'interaction' variables or not.

These statements make me think approach 2 is the way, especially your comment b)
above. 

>  c) Presumably it's not just the dictionary that needs modifying.
>  When you add new interaction, you also need to add values for the
>  variables into the casefile?  That involves running a procedure.  
>  What I did for RANK was to create a temporary variable, which was an
>  illegal name in pspp syntax, and delete it afterwards.

No, no extra data need to be written to the casefile. But given a) and b)
above, I think approach number 2 would be the least painful.

-Jason

> 
> On Sun, Apr 15, 2007 at 03:06:17PM -0400, Jason Stover wrote:
>      To have a glm procedure, pspp needs a data structure to handle
>      interactions. An interaction can be thought of as another variable
>      which is a function of two or more variables, usually categorical,
>      like this:
>      
>           Variable 1          Variable 2        Interaction
>               A                  B                 AB
>        E                 B                 EB
>        A                 C                 AC
>        E                 C                 EC
>        
>      
>      ...etc. The interaction term could be created in one of two ways:
>      Either 1) create a new variable in the dictionary that corresponds to
>      the interaction, or 2) create a new 'interaction' data structure
>      that contains all necessary mappings between existing variables and
>      the value of the interaction.
>      
>      Approach 1 would add a variable to the dictionary, but would not
>      create any more observations in the data set. It would make coding any
>      procedures that use interactions easier than approach 2, because doing
>      so would mean the procedure doesn't need to know about much special
>      code to handle interactions. It would also prevent the need for having
>      any more obscure string-values-to-binary-vector code like that in
>      category.[ch]. Approach 1 would still require the creation of some
>      code to create the interaction, though it may not require the creation
>      of a specialized "interaction" data structure to be available for use
>      by all procedures.
>      
>      Approach 2 doesn't require adding anything to the dictionary, but it
>      does mean that any procedures that need to use interactions would have
>      to create those interactions themselves. These interactions would
>      therefore be lost after the procedure exits, meaning that any other
>      procedure that needs interactions would have to recreate
>      them. Approach 2 also means writing more code that partly duplicates
>      the code already in category.[ch].
>      
>      I favor approach number 1, but before I fiddle with the
>      dictionary, I thought I should ask.
>      
>      -Jason
>      
>      
>      _______________________________________________
>      pspp-dev mailing list
>      address@hidden
>      http://lists.gnu.org/mailman/listinfo/pspp-dev
> 
> -- 
> PGP Public key ID: 1024D/2DE827B3 
> fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
> See http://pgp.mit.edu or any PGP keyserver for public key.
> 
> 






reply via email to

[Prev in Thread] Current Thread [Next in Thread]