pspp-dev
[Top][All Lists]

## Re: [patch #5583] NPAR TESTS

 From: Jason Stover Subject: Re: [patch #5583] NPAR TESTS Date: Sat, 9 Dec 2006 15:51:17 -0500 User-agent: Mutt/1.4.2.1i

```On Sat, Dec 09, 2006 at 11:14:40AM +0900, John Darrington wrote:
> 5.1 NPAR TESTS.  Binomial Test
> +-+------#--------+--+--------------+----------+---------------------+
> | |      #Category| N|Observed Prop.|Test Prop.|Exact Sig. (1-tailed)|
> +-+------#--------+--+--------------+----------+---------------------+
> |x|Group1#    1.00|11|          .550|      .600|                 .404|
> | |Group2#    2.00| 9|          .450|          |                     |
> | |Total #        |20|          1.00|          |                     |
> +-+------#--------+--+--------------+----------+---------------------+
>
>
> And the cumulative Binomial Distribution for p = 0.6, n = 20, x = 11 ,
> is indeed 0.404.
>
> But the formula given in Algorithms says (as I understand it) to use
> the binomial cdf  for  p = 0.4, n = 20, x = 9.  That answer is 0.755

'sorry for not addressing this one before.

First let me define a couple of terms.

Suppose X is binomial with 20 trials and success probability
0.6. Then Y = 20 - X is also binomial, but with success
probability 1 - 0.6 = 0.4.

I don't have the algorithm document in front of me, but I can almost
guarantee that whatever it actually says, the author hoped to say that
in this case, rather than computing Pr (X =< 11), we should compute
Pr (Y >= 9). Both values are equal (about 0.404). The value
0.755 above refers to Pr (Y =< 9) = Pr (X >= 11), which is
certainly not the p-value if we are testing

Ho: p >= 0.6
H1: p < 0.6

which is equivalent to testing

Ho: 1-p =< 0.4
H1: 1-p > 0.4

(the p-value for this test is 0.404 = Pr (X =< 11)). I haven't looked
at the algorithm document for a week, but I remember thinking that it
wasn't clear, so maybe the author did not say exactly what the software
does. I would ignore the algorithm document in this case
if its instructions are counter to the behavior of the software.

> Also, there is what I think is a separate issue:  The book says the
> answer is infact not the binomial cdf, but (2 * cdf - B(k;n,p) )/2 ---
> this seems to be a "correction for continuity" which Siegel &
> Castellen (Chapter 4) says is necessary for the asymptotic
> approximation, but they don't mention it for the exact case.

I wouldn't worry about an asymptotic approximation until the
computation of the exact p-value starts to cause overflows, or
roundoff error accumulates, or if it takes too long to compute. That
won't happen until the number of trials becomes very large.

-Jason

```