GLM, encodings and SSQs

pspp-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

GLM, encodings and SSQs

From:	John Darrington
Subject:	GLM, encodings and SSQs
Date:	Mon, 21 Nov 2011 20:08:11 +0000
User-agent:	Mutt/1.5.18 (2008-05-17)

The good news is, that I found and fixed a bug which was causing the Effects 
Coding
to produce garbage results.  The surprising news (surprising to me anyway) is 
that
having fixed it, Effects Coding produces identical results to Dummy Coding.

The dissapointing news is that we still don't get the same results as SPSS for
unbalanced designs.

However,  I've been looking at various examples on the net, and perhaps I've 
stumbled onto something:

1.  There's a worked example at 
https://netfiles.uiuc.edu/dgs/www/stat324/notes/041604.pdf
   (for R) It doesn't say which "Type" of SSQ it's using, but it does say that 
the
   results are dependent upon the order in which the effects are presented in 
the design
   matrix, which I understood to be true for Type I.

   The example results are for the SUPP variable as the first variable:

              Df Sum Sq Mean Sq F value   Pr(>F)
supp           1 174.46 174.46 17.3664 0.0011049
doselev        2 375.75 187.87 18.7012 0.0001495
supp:doselev 2    17.70    8.85 0.8808 0.4377931
Residuals     13 130.60   10.05

   and for the DOSLEV variable as the first variable:

              Df Sum Sq Mean Sq F value   Pr(>F)
doselev        2 396.08 198.04 19.7131 0.0001158
supp           1 154.13 154.13 15.3428 0.0017685
doselev:supp 2    17.70    8.85 0.8808 0.4377931
Residuals     13 130.60   10.05


  Note that the two main effects are quite different.
  Now when I run the same data with PSPP, I get:

#Corrected Model#                 567,91| 5|     113,58| 11,31| ,00#
#Intercept      #                5956,05| 1|    5956,05|592,87| ,00#
#supp           #                 154,13| 1|     154,13| 15,34| ,00#
#doselev        #                 375,75| 2|     187,87| 18,70| ,00#
#supp * doselev #                  17,70| 2|       8,85|   ,88| ,44#
#Error          #                 130,60|13|      10,05|      |    #
#Total          #                6654,56|19|           |      |    #
#Corrected Total#                 698,51|18|           |      |    #


  Note that PSPPs DOSLEV ssq is identical to Rs DOSLEV ssq in the first example 
above, and the
  SUPP ssq is identical to that in the second example.  The interaction is the 
same for both.


2. Another example, this time for SAS, at 
http://www.sfu.ca/sasdoc/sashtml/stat/chap30/sect52.htm

   I copied the data given there, and ran it through PSPP and got:

#===============#=======================#==#============#==========#=======#
#     Source    #Type III Sum of Squares|df| Mean Square|     F    |  Sig. #
#===============#=======================#==#============#==========#=======#
#Corrected Model#            4259,338506|11|  387,212591|  3,505692|,001298#
#Intercept      #           20672,844828| 1|20672,844828|187,164963|,000000#
#drug           #            3063,432863| 3| 1021,144288|  9,245096|,000067#
#disease        #             418,833741| 2|  209,416870|  1,895990|,161720#
#drug * disease #             707,266259| 6|  117,877710|  1,067225|,395846#
#Error          #            5080,816667|46|  110,452536|          |       #
#Total          #           30013,000000|58|            |          |       #
#Corrected Total#            9340,155172|57|            |          |       #


  Now these numbers are exactly what the SAS example gives for the type II sums 
of squares,
(although PSPP is labelling them as Type III)


3.  A concise but quite useful description of the various ssq "types" can be 
found at
   http://afni.nimh.nih.gov/sscc/gangc/SS.html
   It says this about Type III :

  "SS gives the sum of squares that would be obtained for each variable if it 
   were entered last into the model. That is, the effect of each variable is 
   evaluated after all other factors have been accounted for. Therefore the 
result 
   for each term is equivalent to what is obtained with Type I analysis when the
   term enters the model as the last one in the ordering."

   This would seem to be consistent with our results in 1.

4.  However, none of the SPSS examples I have found which feature unbalanced 
designs 
    actually correspond to what PSPP currently produces for type III ssq.  The 
    interactions are the same, but the main effects quite different.

The forgoing leads me to infer that SPSS has the meaning of Type II and Type 
III 
transposed, in comparison to the rest of the world.  

This sounds somewhat incredible, but seems to be consistent with the evidence 
so far.

I can only suggest that we try to implement the Type II next, and see what 
happens.

J'



-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.

signature.asc
Description: Digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

GLM, encodings and SSQs, John Darrington <=
- Re: GLM, encodings and SSQs, Ben Pfaff, 2011/11/22
- Re: GLM, encodings and SSQs, Jason Stover, 2011/11/22
  - Re: GLM, encodings and SSQs, John Darrington, 2011/11/23
    - Re: GLM, encodings and SSQs, John Darrington, 2011/11/23

Prev by Date: Re: sweep.c
Next by Date: Re: GLM, encodings and SSQs
Previous by thread: sweep.c
Next by thread: Re: GLM, encodings and SSQs
Index(es):
- Date
- Thread