[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: statistical function example
From: |
Matti Picus |
Subject: |
Re: statistical function example |
Date: |
Thu, 25 Aug 2005 21:12:59 +0000 (UTC) |
User-agent: |
Loom/3.14 (http://gmane.org/) |
Dean Allen Provins <provinsd <at> telusplanet.net> writes:
> > On Tue, 23 Aug 2005, Dean Allen Provins wrote:
> >
> > >I have been trying to make some sense out of the "kolmogorov_smirnov_test"
> > >function result. Given a sample of 8 data points, for which Swan and
> > >Sandilands, "Introduction to Geological Data Analysis", give a clear
> > >answer, I cannot get an answer from the KS test that has any meaning
> > >for me.
> > >
> > >S&S obtain the maximum deviation (about 0.22) and compare that value to
> > >that which would be exceeded with probability 0.05 (their table indicates
> > >about 0.46). The second return value from the Octave KS test is much
> > >larger:
> > >
> > > p = 0.053223
> > > k = 1.3466
> > >
> > >I presume the "p" value is the probability of rejecting H0, but what is
> > >"k"? No such value appears in the one-sided test tables that I located
> > >on the 'net.
> > >
> > >The input data X and the cumulative frquency used (i/n+1) is:
> > > X CF
> > > 0.07000 0.11111
> > > 0.12000 0.22222
> > > -0.06000 0.33333
> > > -0.04000 0.44444
> > > -0.05000 0.55556
> > > 0.08000 0.66667
> > > 0.04000 0.77778
> > > 0.00000 0.88889
> > >
> > >Would any readers with some insight care to enlighten me?
> > >
> > >
> > >Thanks,
> > >
> > >Dean
Background : just so we are talking about the same thing...
The test works like this: given two sampled "cumulative frequencies" F1 and F2
(btw they are more commonly refereed to as "cumulative distribution functions"),
calculate a value k based on the number of samples in each F1 and F2 and the
maximum distance between them (maximum distance is defined as follows: plot the
two distributions using the sampled values on the x axis and their associatd
probablilities on the y axis. Maximum distance is the point at a vertical line
joining the two plots is maximum length). Then use the value k to look up a
probability for H0.
You can accept H0 with confidence level p, or alternatively reject it with
confidence (1-p). A value of 0.05 makes it pretty clear that the two
distributions are different. There are different methods for calculating p from
k, some authors are a little careless for k values that result in such a clear
rejection of the null hypothesis since those cases are not interesting to most
of us.
The call to the octave implementation of the test assumes that you have
x - a set of raw obesrvations
i.e. [0, 0.4, -0.1, 0.7, 0.3, 0.4, -0.9]
dist - a text string that when evaluated using feval('dist_cdf(y)') will yeild
the CDF of the chosen distribution at the value y
so a call to the function like
[p,k]=kolmogorov_smirnov_test(x, "uniform", 0, 1)
would give the probability p that the sample x is drawn from a uniform
distribution over 0 to 1.
The value k would be an intermediate value calulated from the length of x and
the maximum difference between a sampled CDF of x and a uniform distribution,
used to look up p.
The strength of the test is that the value of k determines directly the
probablility, with no assumptions about either distribution
Did this help?
Matti
-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.
Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html
-------------------------------------------------------------