help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Determining if samples are normal


From: Paul Kienzle
Subject: Re: Determining if samples are normal
Date: Mon, 26 Sep 2005 19:48:18 -0400

Anderson-Darling returns small q if a sample is unlikely to have been drawn from the given distribution.

octave> anderson_darling_test (randn(100,1),'normal')
ans = 1
octave> anderson_darling_test (rand(100,1),'normal')
ans = 0.010000


So this is consistent with randn returning normally distributed numbers, but rand almost certainly does not. Note that the test does not tell you that the numbers come from a normal process. They may for example be correlated in the sequence, or they may be too regular.


Checking the anderson-darling test 10000 times for a sample size of 100 I get the following results:

octave:24>  c = anderson_darling_test(randn(100,10000),'normal');
octave:25>  tabulate(100*c,100*[unique(c),1]);
     bin     Fa       Fr%        Fc
       1     83      0.83%       83
     2.5    163      1.63%      246
       5    273      2.73%      519
      10    506      5.06%     1025
     100   8975     89.75%    10000

The Fc column is cumulative frequency so divide by 100 to get percent. About 0.8% of the examples return q<=0.01, 2.5% return q <= 0.025, 5.2% return q <= 0.05, 10.3% return q <= 0.1. Some samples drawn from a normal distribution will not look very normal, and no data driven test is going to be able to identify them as such.

Trying with a uniform distribution, again of sample size 100:

octave>  c = anderson_darling_test(rand(100,10000),'normal');
octave>  tabulate(100*c,100*[unique(c),1]);
     bin     Fa       Fr%        Fc
       1   7907     79.07%     7907
     2.5   1097     10.97%     9004
       5    507      5.07%     9511
      10    321      3.21%     9832
     100    168      1.68%    10000

Most samples of size 100 from a uniform distributions will be rejected as not normal with 97.5% confidence by the Anderson-Darling test.

Triangular distributions look much too normal. We will not be able to distinguish them with the Anderson-Darling test:

octave:26> c = anderson_darling_test(rand(100,10000)+rand(100,10000),'normal'); octave:27> tabulate(100*c,100*[unique(c),1]); bin Fa Fr% Fc
       1    137      1.37%      137
     2.5    202      2.02%      339
       5    407      4.07%      746
      10    757      7.57%     1503
     100   8497     84.97%    10000

Using n=400, 60% of the triangular samples are rejected at a .1 level, but of course 10% of the normal samples are as well.

- Paul

On Sep 26, 2005, at 5:37 PM, Henry F. Mollet wrote:

Am I doing this correctly? Anderson_darling_test of cdf normal and cdf
logistic give the same result (0.01). Graphs shows that they are different.
Henry

octave:21> x=linspace (-4,4,100);
octave:22> mu=0.0; sigma = 1.0;
octave:23> cnormalx=0.5+0.5.*erf((x-mu)./sigma./sqrt(2));
octave:24> plot (x,cnormalx,"x")
octave:25> anderson_darling_test(cnormalx,'normal')
ans = 0.010000
octave:26> a=0; lambda=2;
octave:27> clogisticx=1.0./(1.0+exp(-lambda.*(x-a)));
octave:28> hold on
octave:30> plot (x,clogisticx,"@33")
octave:31> anderson_darling_test(clogisticx,'normal')
ans = 0.010000



on 9/26/05 12:55 AM, Paul Kienzle at address@hidden wrote:

The Anderson-Darling test is claimed to be pretty good:

http://www.itl.nist.gov/div898/handbook/prc/section2/prc213.htm
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm

Here's the relevant section from the R manual:

http://www.maths.lth.se/help/R/.R/library/nortest/html/ad.test.html

The Anderson-Darling test is an EDF omnibus test for the composite
hypothesis of normality. The test statistic is

A^2 = -n -frac{1}{n} sum_{i=1}^{n} [2i-1] [ln(p_{(i)}) + ln(1 -
p_{(n-i+1)})],

where p_{(i)} = Phi([x_{(i)} - overline{x}]/s). Here, Phi is the
cumulative distribution function of the standard normal distribution,
and overline{x} and s are mean and standard deviation of the data
values. The p-value is computed from the modified statistic

Z=A^2 (1.0 + 0.75/n +2.25/n^{2})

according to Table 4.9 in Stephens (1986).

Here are the critical values I found elsewhere on the net.

90% 0.631
95% 0.752
97.5% 0.873
99% 1.035

For example, if A^2 > 0.752 you can say that your data set is not
normally distributed with 95% confidence.

This is implemented in octave-forge as 1-p =
anderson_darling_test(x,'normal').  That is, if anderson_darling_test
returns a value of 0.05 then you can say your data set is not normally
distributed with 95% confidence.

- Paul

On Sep 25, 2005, at 1:59 PM, Søren Hauberg wrote:

Hi,
Does anybody know how I can test wether or not some samples are
normaly distributed? I tried graphical methods, such as looking at
histograms and qqplots, but I don't trust my own judgement enough to
use graphical methods.

/Søren




-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------







-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]