[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Problems with Statistics package
From: |
Hannes |
Subject: |
Re: Problems with Statistics package |
Date: |
Wed, 30 May 2012 11:06:42 +0200 |
User-agent: |
Dynamic Internet Messaging Program (DIMP) H3 (1.1.4) |
Ok, I think I got it pinned downed and solved. I have attached a patch.
Heres the explenation:
total_mean = mean (y(:));
SSB = sum (group_count .* (group_mean - total_mean) .^ 2);
is always positive [sum of squares between]
SST = sumsq (reshape (y, n, 1) - total_mean);
is always positive and >= SSB [sum of squares total]
SSW = SST - SSB;
=> should always be >=0
but it is an ill conditioned operation if SST==SSB resulting in
possibly negative error. Note that if the error is positive, it
doesn't matter because it will produce f == Inf and p = 0 as expected.
df_b = k - 1;
df_w = n - k;
v_b = SSB / df_b;
v_w = SSW / df_w;
this can become negative, if the SSW is negative or 0 if SSW is zero
or close to zero.
f = v_b / v_w;
this results in divide by zero if v_w is zero and results in *wrong
result without warning* if v_w is negative.
My patch changes this whole block to:
SSB = sum (group_count .* (group_mean - total_mean) .^ 2);
SST = sumsq (reshape (y, n, 1) - total_mean);
SSW = SST - SSB;
if (SSW < 0) # machine error if SSB == SSW
SSW = 0;
endif
df_b = k - 1;
df_w = n - k;
v_b = SSB / df_b;
v_w = SSW / df_w;
if (v_w != 0)
f = v_b / v_w;
pval = 1 - fcdf (f, df_b, df_w);
elseif (v_b == 0)
f = NaN;
pval = 1;
0 / 0 is NaN and the hypothesis mus be rejected
else
f = Inf;
pval = 0;
x / 0 is Inf and hypothesis must be accepted (because there is no
variance inside the sample but there is outside
endif
HTH,
Hannes
patch_anova.m
Description: Text Data