|Subject:||Re: regression, and missing data|
|Date:||Tue, 06 Mar 2012 00:06:44 -0500|
|User-agent:||Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2|
This appears to be a bug in the PSPP regression routine with data with a large amount of missing values!
I recently noticed some small discrepancies between simple bivariate regression results between IBM SPSS, STATA and PSPP. Until Prof. Shackman's email, I hadn't realized that the discrepancies only occur when there are many missing values. I was just confused...
Sadly, I also find problems when running linear regressions using PSPP on data with missing values. I wish I knew what was causing the problem.
So, using Dropbox, I wanted to make available some data which seems to illustrate the issue.
Using psppire.exe 0.7.9-gab8ce2 on Windows AND psppire 0.7.8 on LinuxMint LXDE, PSPP calculates descriptive statistics just like SPSS and STATA on the same dataset, but does not calculate identical b coefficients when running bivariate or multivariate regressions.
I created the following public opinion survey data files consisting of three variables from the 2004 Canadian Election Study which I recoded and declared certain values to be missing:
http://dl.dropbox.com/u/35198072/ces2004-regtest.sav has many observations with missing values.
http://dl.dropbox.com/u/35198072/ces2004-regtest2.sav has the same three variables, but I dropped all of the cases with missing values.
This is the syntax file used to run descriptive statistics and three regression analyses.
PSPP generates these regression results and descriptive statistics with missing values:
PSPP generates these regression results and descriptive statistics using the data without any missing values:
Here is the STATA output on the same output (.log is a text file - email me if you have a problem opening it). The first three regressions should match the output in regression-test-pspp1.html
They are close, but not close enough... The bottom three regressions use the data with no missing values and these DO match PSPP's output (in regression-test-pspp2.html).
I also ran the data on SPSS and found results consistent with STATA. There did not seem to be any problems with Pearson's Chi-Square or Kendall's Tau-B when running a crosstab on the data with the missing values.
I am sorry I don't know what has gone wrong, so I am making available this data in hopes someone might figure out where there is a mistake. I caution other users running regression on PSPP.
On 04-Mar-12 11:37 PM, Gene Shackman wrote:
-- Renan Levine Department of Political Science University of Toronto - Scarborough address@hidden http://individual.utoronto.ca/renan (416) 208-2651
|[Prev in Thread]||Current Thread||[Next in Thread]|