Re: Determining if samples are normal

help-octave

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Determining if samples are normal

From:	Paul Kienzle
Subject:	Re: Determining if samples are normal
Date:	Mon, 26 Sep 2005 19:48:18 -0400

Anderson-Darling returns small q if a sample is unlikely to have beendrawn from the given distribution.


octave> anderson_darling_test (randn(100,1),'normal')
ans = 1
octave> anderson_darling_test (rand(100,1),'normal')
ans = 0.010000

So this is consistent with randn returning normally distributednumbers, but rand almost certainly does not. Note that the test doesnot tell you that the numbers come from a normal process. They may forexample be correlated in the sequence, or they may be too regular.

Checking the anderson-darling test 10000 times for a sample size of 100I get the following results:


octave:24>  c = anderson_darling_test(randn(100,10000),'normal');
octave:25>  tabulate(100*c,100*[unique(c),1]);
     bin     Fa       Fr%        Fc
       1     83      0.83%       83
     2.5    163      1.63%      246
       5    273      2.73%      519
      10    506      5.06%     1025
     100   8975     89.75%    10000

The Fc column is cumulative frequency so divide by 100 to get percent.About 0.8% of the examples return q<=0.01, 2.5% return q <= 0.025, 5.2%return q <= 0.05, 10.3% return q <= 0.1. Some samples drawn from anormal distribution will not look very normal, and no data driven testis going to be able to identify them as such.


Trying with a uniform distribution, again of sample size 100:

octave>  c = anderson_darling_test(rand(100,10000),'normal');
octave>  tabulate(100*c,100*[unique(c),1]);
     bin     Fa       Fr%        Fc
       1   7907     79.07%     7907
     2.5   1097     10.97%     9004
       5    507      5.07%     9511
      10    321      3.21%     9832
     100    168      1.68%    10000

Most samples of size 100 from a uniform distributions will be rejectedas not normal with 97.5% confidence by the Anderson-Darling test.

Triangular distributions look much too normal. We will not be able todistinguish them with the Anderson-Darling test:

octave:26> c =anderson_darling_test(rand(100,10000)+rand(100,10000),'normal');octave:27> tabulate(100*c,100*[unique(c),1]); bin Fa Fr%Fc

       1    137      1.37%      137
     2.5    202      2.02%      339
       5    407      4.07%      746
      10    757      7.57%     1503
     100   8497     84.97%    10000

Using n=400, 60% of the triangular samples are rejected at a .1 level,but of course 10% of the normal samples are as well.


- Paul

On Sep 26, 2005, at 5:37 PM, Henry F. Mollet wrote:

Am I doing this correctly? Anderson_darling_test of cdf normal and cdf

logistic give the same result (0.01). Graphs shows that they aredifferent.

Henry

octave:21> x=linspace (-4,4,100);
octave:22> mu=0.0; sigma = 1.0;
octave:23> cnormalx=0.5+0.5.*erf((x-mu)./sigma./sqrt(2));
octave:24> plot (x,cnormalx,"x")
octave:25> anderson_darling_test(cnormalx,'normal')
ans = 0.010000
octave:26> a=0; lambda=2;
octave:27> clogisticx=1.0./(1.0+exp(-lambda.*(x-a)));
octave:28> hold on
octave:30> plot (x,clogisticx,"@33")
octave:31> anderson_darling_test(clogisticx,'normal')
ans = 0.010000

on 9/26/05 12:55 AM, Paul Kienzle at address@hiddenwrote:

The Anderson-Darling test is claimed to be pretty good:

http://www.itl.nist.gov/div898/handbook/prc/section2/prc213.htm
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm

Here's the relevant section from the R manual:

http://www.maths.lth.se/help/R/.R/library/nortest/html/ad.test.html

The Anderson-Darling test is an EDF omnibus test for the composite
hypothesis of normality. The test statistic is

A^2 = -n -frac{1}{n} sum_{i=1}^{n} [2i-1] [ln(p_{(i)}) + ln(1 -
p_{(n-i+1)})],

where p_{(i)} = Phi([x_{(i)} - overline{x}]/s). Here, Phi is the
cumulative distribution function of the standard normal distribution,
and overline{x} and s are mean and standard deviation of the data
values. The p-value is computed from the modified statistic

Z=A^2 (1.0 + 0.75/n +2.25/n^{2})

according to Table 4.9 in Stephens (1986).


Here are the critical values I found elsewhere on the net.

90% 0.631
95% 0.752
97.5% 0.873
99% 1.035

For example, if A^2 > 0.752 you can say that your data set is not
normally distributed with 95% confidence.

This is implemented in octave-forge as 1-p =
anderson_darling_test(x,'normal').  That is, if anderson_darling_test
returns a value of 0.05 then you can say your data set is not normally
distributed with 95% confidence.

- Paul

On Sep 25, 2005, at 1:59 PM, Søren Hauberg wrote:

Hi,
Does anybody know how I can test wether or not some samples are
normaly distributed? I tried graphical methods, such as looking at
histograms and qqplots, but I don't trust my own judgement enough to
use graphical methods.

/Søren





-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------





-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Determining if samples are normal, (continued)
- Re: Determining if samples are normal, Joe Koski, 2005/09/25
  - Re: Determining if samples are normal, Søren Hauberg, 2005/09/25
    - Re: Determining if samples are normal, Robert A. Macy, 2005/09/25
  - Re: Determining if samples are normal, Michael Creel, 2005/09/26
- Re: Determining if samples are normal, Paul Kienzle, 2005/09/26
  - Re: Determining if samples are normal, Søren Hauberg, 2005/09/26
  - Re: Determining if samples are normal, Henry F. Mollet, 2005/09/26
    - Re: Determining if samples are normal, Paul Kienzle <=
    - Re: Determining if samples are normal, Mike Miller, 2005/09/26
    - Re: Determining if samples are normal, Paul Kienzle, 2005/09/26
    - Re: Determining if samples are normal, Mike Miller, 2005/09/26
    - Re: Determining if samples are normal, Henry F. Mollet, 2005/09/27
    - Re: Determining if samples are normal, Mike Miller, 2005/09/27

Prev by Date: Re: speed of octave interpreter
Next by Date: Re: Determining if samples are normal
Previous by thread: Re: Determining if samples are normal
Next by thread: Re: Determining if samples are normal
Index(es):
- Date
- Thread