
From:  Renan Levine 
Subject:  Re: regression, and missing data 
Date:  Tue, 06 Mar 2012 00:06:44 0500 
Useragent:  Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 
Hi This appears to be a bug in the PSPP regression routine with data with a large amount of missing values! I recently noticed some small discrepancies between simple bivariate regression results between IBM SPSS, STATA and PSPP. Until Prof. Shackman's email, I hadn't realized that the discrepancies only occur when there are many missing values. I was just confused... Sadly, I also find problems when running linear regressions using PSPP on data with missing values. I wish I knew what was causing the problem. So, using Dropbox, I wanted to make available some data which seems to illustrate the issue. Using psppire.exe 0.7.9gab8ce2 on Windows AND psppire 0.7.8 on LinuxMint LXDE, PSPP calculates descriptive statistics just like SPSS and STATA on the same dataset, but does not calculate identical b coefficients when running bivariate or multivariate regressions. I created the following public opinion survey data files consisting of three variables from the 2004 Canadian Election Study which I recoded and declared certain values to be missing: http://dl.dropbox.com/u/35198072/ces2004regtest.sav has many observations with missing values. http://dl.dropbox.com/u/35198072/ces2004regtest2.sav has the same three variables, but I dropped all of the cases with missing values. This is the syntax file used to run descriptive statistics and three regression analyses. http://dl.dropbox.com/u/35198072/regressiontests.sps PSPP generates these regression results and descriptive statistics with missing values: http://dl.dropbox.com/u/35198072/regressiontestpspp1.html PSPP generates these regression results and descriptive statistics using the data without any missing values: http://dl.dropbox.com/u/35198072/regressiontestpspp2.html Here is the STATA output on the same output (.log is a text file  email me if you have a problem opening it). The first three regressions should match the output in regressiontestpspp1.html They are close, but not close enough... The bottom three regressions use the data with no missing values and these DO match PSPP's output (in regressiontestpspp2.html). http://dl.dropbox.com/u/35198072/regressionteststata.log I also ran the data on SPSS and found results consistent with STATA. There did not seem to be any problems with Pearson's ChiSquare or Kendall's TauB when running a crosstab on the data with the missing values. I am sorry I don't know what has gone wrong, so I am making available this data in hopes someone might figure out where there is a mistake. I caution other users running regression on PSPP. Yours, Renan On 04Mar12 11:37 PM, Gene Shackman wrote:
 Renan Levine Department of Political Science University of Toronto  Scarborough address@hidden http://individual.utoronto.ca/renan (416) 2082651 
[Prev in Thread]  Current Thread  [Next in Thread] 