[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Mon, 16 Apr 2007 21:02:48 -0400
On Tue, Apr 17, 2007 at 08:28:18AM +0800, John Darrington wrote:
> On Mon, Apr 16, 2007 at 01:53:54PM -0400, Jason Stover wrote:
> On Mon, Apr 16, 2007 at 11:10:43AM +0800, John Darrington wrote:
> > If we were to follow approach 2, am I right in thinking that the
> > 'interaction' data structure could be as large as the number of
> > cases in the casefile?
> No. It would have either a hash of possible values (all unique), or a
> small function to get back and forth between a union value and a
> binary vector.
> So, given an interaction involving N variables, from a datafile with M
> observations, what is the upper bound on the size of this hash ?
That depends on the number of distinct values of the variables. If
you have 2 categorical variables, one with n possible values and the
other with m, the hash would need n*m-1 entries. With k variables and
n1, n2,...,nk distinct possible values, the number of entries would be
n1*n2*...*nk - 1. Only in unusual circumstances would k be larger
than 3, and almost never largers than 4, but that is how people
"should" use interactions. But some users could make a lot more
interactions, making that hash very large.
If the variables are numeric, then the interaction is just their
product. If one is numeric and one categorical, then the interaction
is the scalar product.