octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [OctDev] Fwd: Re: Problems with Statistics package


From: Hannes
Subject: Re: [OctDev] Fwd: Re: Problems with Statistics package
Date: Tue, 12 Jun 2012 11:34:48 +0200
User-agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.4)

Concerning the matter below, I didn't yet get a response. However I realised that most of the tests fail or produce (divide by zero errors) if the variance of a sample is 0. Since errors imply a wrong behaviour by the user (and e.g. testing two samples with variance 0 in var_test is completely legal) I think this should be fixed.

I have attached patches for t_test_2 and var_test. Other tests can fixed accordingly.

Thanks for hacking gnu-octave.

Best,
Hannes



Quoting "Arno Onken" <address@hidden>:

On Wed, 30 May 2012 11:25:14, marco atzeri wrote:
Hi Hannes, octave-forge mailing list is the right place for bugs and
patches on the statistics package

anova.m is not part of the Octave Forge Statistics package. It is part
of Octave core. I am cc'ing this mail along with the patch to the Octave
maintainers list.

Arno

-------- Original Message --------
Subject: Re: Problems with Statistics package
Date: Wed, 30 May 2012 11:06:42 +0200
From: Hannes <address@hidden>
To: address@hidden

Ok, I think I got it pinned downed and solved. I have attached a patch.

Heres the explenation:

 total_mean = mean (y(:));
 SSB = sum (group_count .* (group_mean - total_mean) .^ 2);
is always positive [sum of squares between]

 SST = sumsq (reshape (y, n, 1) - total_mean);
is always positive and >= SSB [sum of squares total]

 SSW = SST - SSB;
=> should always be >=0
but it is an ill conditioned operation if SST==SSB resulting in
possibly negative error. Note that if the error is positive, it
doesn't matter because it will produce f == Inf and p = 0 as expected.

 df_b = k - 1;
 df_w = n - k;
 v_b = SSB / df_b;
 v_w = SSW / df_w;
this can become negative, if the SSW is negative or 0 if SSW is zero
or close to zero.


 f = v_b / v_w;
this results in divide by zero if v_w is zero and results in *wrong
result without warning* if v_w is negative.


My patch changes this whole block to:

   SSB = sum (group_count .* (group_mean - total_mean) .^ 2);
   SST = sumsq (reshape (y, n, 1) - total_mean);
   SSW = SST - SSB;
   if (SSW < 0) # machine error if SSB == SSW
     SSW = 0;
   endif
   df_b = k - 1;
   df_w = n - k;
   v_b = SSB / df_b;
   v_w = SSW / df_w;
   if (v_w != 0)
     f = v_b / v_w;
     pval = 1 - fcdf (f, df_b, df_w);
   elseif (v_b == 0)
     f = NaN;
     pval = 1;

0 / 0 is  NaN and the hypothesis mus be rejected

   else
     f = Inf;
     pval = 0;

x / 0 is Inf and hypothesis must be accepted (because there is no
variance inside the sample but there is outside

   endif

HTH,
Hannes



Attachment: patch-t_test_2.m
Description: Text Data

Attachment: patch-var_test.m
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]