[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Determining if samples are normal
From: |
Paul Kienzle |
Subject: |
Re: Determining if samples are normal |
Date: |
Mon, 26 Sep 2005 19:48:18 -0400 |
Anderson-Darling returns small q if a sample is unlikely to have been
drawn from the given distribution.
octave> anderson_darling_test (randn(100,1),'normal')
ans = 1
octave> anderson_darling_test (rand(100,1),'normal')
ans = 0.010000
So this is consistent with randn returning normally distributed
numbers, but rand almost certainly does not. Note that the test does
not tell you that the numbers come from a normal process. They may for
example be correlated in the sequence, or they may be too regular.
Checking the anderson-darling test 10000 times for a sample size of 100
I get the following results:
octave:24> c = anderson_darling_test(randn(100,10000),'normal');
octave:25> tabulate(100*c,100*[unique(c),1]);
bin Fa Fr% Fc
1 83 0.83% 83
2.5 163 1.63% 246
5 273 2.73% 519
10 506 5.06% 1025
100 8975 89.75% 10000
The Fc column is cumulative frequency so divide by 100 to get percent.
About 0.8% of the examples return q<=0.01, 2.5% return q <= 0.025, 5.2%
return q <= 0.05, 10.3% return q <= 0.1. Some samples drawn from a
normal distribution will not look very normal, and no data driven test
is going to be able to identify them as such.
Trying with a uniform distribution, again of sample size 100:
octave> c = anderson_darling_test(rand(100,10000),'normal');
octave> tabulate(100*c,100*[unique(c),1]);
bin Fa Fr% Fc
1 7907 79.07% 7907
2.5 1097 10.97% 9004
5 507 5.07% 9511
10 321 3.21% 9832
100 168 1.68% 10000
Most samples of size 100 from a uniform distributions will be rejected
as not normal with 97.5% confidence by the Anderson-Darling test.
Triangular distributions look much too normal. We will not be able to
distinguish them with the Anderson-Darling test:
octave:26> c =
anderson_darling_test(rand(100,10000)+rand(100,10000),'normal');
octave:27> tabulate(100*c,100*[unique(c),1]); bin Fa Fr%
Fc
1 137 1.37% 137
2.5 202 2.02% 339
5 407 4.07% 746
10 757 7.57% 1503
100 8497 84.97% 10000
Using n=400, 60% of the triangular samples are rejected at a .1 level,
but of course 10% of the normal samples are as well.
- Paul
On Sep 26, 2005, at 5:37 PM, Henry F. Mollet wrote:
Am I doing this correctly? Anderson_darling_test of cdf normal and cdf
logistic give the same result (0.01). Graphs shows that they are
different.
Henry
octave:21> x=linspace (-4,4,100);
octave:22> mu=0.0; sigma = 1.0;
octave:23> cnormalx=0.5+0.5.*erf((x-mu)./sigma./sqrt(2));
octave:24> plot (x,cnormalx,"x")
octave:25> anderson_darling_test(cnormalx,'normal')
ans = 0.010000
octave:26> a=0; lambda=2;
octave:27> clogisticx=1.0./(1.0+exp(-lambda.*(x-a)));
octave:28> hold on
octave:30> plot (x,clogisticx,"@33")
octave:31> anderson_darling_test(clogisticx,'normal')
ans = 0.010000
on 9/26/05 12:55 AM, Paul Kienzle at address@hidden
wrote:
The Anderson-Darling test is claimed to be pretty good:
http://www.itl.nist.gov/div898/handbook/prc/section2/prc213.htm
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm
Here's the relevant section from the R manual:
http://www.maths.lth.se/help/R/.R/library/nortest/html/ad.test.html
The Anderson-Darling test is an EDF omnibus test for the composite
hypothesis of normality. The test statistic is
A^2 = -n -frac{1}{n} sum_{i=1}^{n} [2i-1] [ln(p_{(i)}) + ln(1 -
p_{(n-i+1)})],
where p_{(i)} = Phi([x_{(i)} - overline{x}]/s). Here, Phi is the
cumulative distribution function of the standard normal distribution,
and overline{x} and s are mean and standard deviation of the data
values. The p-value is computed from the modified statistic
Z=A^2 (1.0 + 0.75/n +2.25/n^{2})
according to Table 4.9 in Stephens (1986).
Here are the critical values I found elsewhere on the net.
90% 0.631
95% 0.752
97.5% 0.873
99% 1.035
For example, if A^2 > 0.752 you can say that your data set is not
normally distributed with 95% confidence.
This is implemented in octave-forge as 1-p =
anderson_darling_test(x,'normal'). That is, if anderson_darling_test
returns a value of 0.05 then you can say your data set is not normally
distributed with 95% confidence.
- Paul
On Sep 25, 2005, at 1:59 PM, Søren Hauberg wrote:
Hi,
Does anybody know how I can test wether or not some samples are
normaly distributed? I tried graphical methods, such as looking at
histograms and qqplots, but I don't trust my own judgement enough to
use graphical methods.
/Søren
-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.
Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.
Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------
- Re: Determining if samples are normal, (continued)
Re: Determining if samples are normal, Joe Koski, 2005/09/25
Re: Determining if samples are normal, Paul Kienzle, 2005/09/26
Re: Determining if samples are normal, Henry F. Mollet, 2005/09/27
Re: Determining if samples are normal, Mike Miller, 2005/09/27