[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Sat, 7 Jun 2008 15:11:28 -0400
> 2008/6/7 John Darrington <address@hidden>:
> > On Thu, Jun 05, 2008 at 05:05:38PM -0400, Jason Stover wrote:
> > The big
> > problem there is the "accounting" problem of mapping values of a
> > qualitative
> > variable back and forth to vectors with binary entries.
> > I don't understand why this is a "big" problem, but perhaps I'm being
> > naive. Would it be possible to have a brief specification for the
> > problem.
On Sat, Jun 07, 2008 at 09:49:27AM +0100, Ed wrote:
> I read this code last night, and the existing implementation is
> straightforward, but doesn't handle some of the more complicated
> sigma restricted encoding (this seems tough - might be worth leaving
> as a later enhancement)
> - which lead to:
> nested designs
> [partial] factorial designs
> mixture surface models (i think they're called - regression with
> I'm not sure what the ideal spec for a routine building a design
> matrix is. The existing code does everything you need at a basic
> level, provided you have all your independent variables, but it
> doesn't introduce terms to handle interactions. Something on top
> perhaps needs to take a model spec like A(B) C C*D or whatever and
> turn that into a set of independent variables for the design matrix
> routine to handle.
This is what makes the design matrix routine a "big" problem. I'm not
sure how big, but it does need to know which columns in the matrix
belong to which variables (that's already done), which columns
correspond to interactions, which to nested effects, and random
effects. Mapping interactions to columns might not be easy. Also, the
coefficient portion of the model struct will need a way to match
coefficients with columns (or maybe variables). The GLM procedure code
would have to call the design matrix code, hand it a model with any
conceivable combination of these kinds of effects, and get a design
matrix back, along with a way to match any variables (or combinations
thereof) with the corresponding columns in the design matrix. This was
the hardest part of writing the REGRESSION procedure, so I think
it will be the hardest part of writing a GLM procedure.
Once the design matrix is in place, estimation can proceed according
to one of the many algorithms out there in the literature. Even if we
picked the wrong one, it wouldn't be hard to change purely linear
algebraic code later. The problem is going to be getting the data to
the algorithm, and sorting through the results afterward.
> I haven't really thought this through yet, but I am hoping to work on it.
I'm not sure of the best way to do it, either. It might be worth taking a
look at similar code in R.