-------- Original Message --------
Subject: Re: Problems with Statistics package
Date: Wed, 30 May 2012 11:06:42 +0200
From: Hannes <address@hidden>
To: address@hidden
Ok, I think I got it pinned downed and solved. I have attached a patch.
Heres the explenation:
total_mean = mean (y(:));
SSB = sum (group_count .* (group_mean - total_mean) .^ 2);
is always positive [sum of squares between]
SST = sumsq (reshape (y, n, 1) - total_mean);
is always positive and >= SSB [sum of squares total]
SSW = SST - SSB;
=> should always be >=0
but it is an ill conditioned operation if SST==SSB resulting in
possibly negative error. Note that if the error is positive, it
doesn't matter because it will produce f == Inf and p = 0 as expected.
df_b = k - 1;
df_w = n - k;
v_b = SSB / df_b;
v_w = SSW / df_w;
this can become negative, if the SSW is negative or 0 if SSW is zero
or close to zero.
f = v_b / v_w;
this results in divide by zero if v_w is zero and results in *wrong
result without warning* if v_w is negative.
My patch changes this whole block to:
SSB = sum (group_count .* (group_mean - total_mean) .^ 2);
SST = sumsq (reshape (y, n, 1) - total_mean);
SSW = SST - SSB;
if (SSW < 0) # machine error if SSB == SSW
SSW = 0;
endif
df_b = k - 1;
df_w = n - k;
v_b = SSB / df_b;
v_w = SSW / df_w;
if (v_w != 0)
f = v_b / v_w;
pval = 1 - fcdf (f, df_b, df_w);
elseif (v_b == 0)
f = NaN;
pval = 1;
0 / 0 is NaN and the hypothesis mus be rejected
else
f = Inf;
pval = 0;
x / 0 is Inf and hypothesis must be accepted (because there is no
variance inside the sample but there is outside
endif
HTH,
Hannes