help-octave
[Top][All Lists]

Re: statistical function example

 From: Matti Picus Subject: Re: statistical function example Date: Thu, 25 Aug 2005 21:12:59 +0000 (UTC) User-agent: Loom/3.14 (http://gmane.org/)

Dean Allen Provins <provinsd <at> telusplanet.net> writes:

> > On Tue, 23 Aug 2005, Dean Allen Provins wrote:
> >
> > >I have been trying to make some sense out of the "kolmogorov_smirnov_test"
> > >function result.  Given a sample of 8 data points, for which Swan and
> > >Sandilands, "Introduction to Geological Data Analysis", give a clear
> > >answer, I cannot get an answer from the KS test that has any meaning
> > >for me.
> > >
> > >S&S obtain the maximum deviation  (about 0.22) and compare that value to
> > >that which would be exceeded with probability 0.05 (their table indicates
> > >about 0.46).  The second return value from the Octave KS test is much
> > >larger:
> > >
> > >         p = 0.053223
> > >         k = 1.3466
> > >
> > >I presume the "p" value is the probability of rejecting H0, but what is
> > >"k"?  No such value appears in the one-sided test tables that I located
> > >on the 'net.
> > >
> > >The input data X and the cumulative frquency used (i/n+1) is:
> > >     X             CF
> > >  0.07000   0.11111
> > >  0.12000   0.22222
> > > -0.06000   0.33333
> > > -0.04000   0.44444
> > > -0.05000   0.55556
> > >  0.08000   0.66667
> > >  0.04000   0.77778
> > >  0.00000   0.88889
> > >
> > >Would any readers with some insight care to enlighten me?
> > >
> > >
> > >Thanks,
> > >
> > >Dean
Background : just so we are talking about the same thing...
The test works like this: given two sampled "cumulative frequencies" F1 and F2
(btw they are more commonly refereed to as "cumulative distribution functions"),
calculate a value k based on the number of samples in each F1 and F2 and the
maximum distance between them (maximum distance is defined as follows: plot the
two distributions using the sampled values on the x axis and their associatd
probablilities on the y axis. Maximum distance is the point at a vertical line
joining the two plots is maximum length). Then use the value k to look up a
probability for H0.

You can accept H0 with confidence level p, or alternatively reject it with
confidence (1-p). A value of 0.05 makes it pretty clear that the two
distributions are different. There are different methods for calculating p from
k, some authors are a little careless for k values that result in such a clear
rejection of the null hypothesis since those cases are not interesting to most
of us.

The call to the octave implementation of the test assumes that you have
x - a set of raw obesrvations
i.e. [0, 0.4,  -0.1, 0.7, 0.3, 0.4, -0.9]
dist - a text string that when evaluated using feval('dist_cdf(y)') will yeild
the CDF of the chosen distribution at the value y

so a call to the function like
[p,k]=kolmogorov_smirnov_test(x, "uniform", 0, 1)
would give the probability p that the sample x is drawn from a uniform
distribution over 0 to 1.
The value k would be an intermediate value calulated from the length of x and
the maximum difference between a sampled CDF of x and a uniform distribution,
used to look up p.

The strength of the test is that the value of k determines directly the
probablility, with no assumptions about either distribution

Did this help?
Matti

-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------

reply via email to