[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gnubg] Re: Rollout jsd, statsig etc. [LONG]

From: Timothy Y. Chow
Subject: [Bug-gnubg] Re: Rollout jsd, statsig etc. [LONG]
Date: Mon, 16 Nov 2009 13:48:45 -0500 (EST)

Massimiliano Maini <address@hidden> wrote:
> Why don't we show the % instead of the JSD ? It's much more reasonable.

The trouble with this is that the percentages don't mean what you think 
they mean.

In the bgonline thread, some people got the misimpression that the
points I was making were philosophical ones, and that I was arguing
as a Bayesian.  Before I go any further, let me state clearly at the
outset that the points I am about to make are *strictly from the point
of view of classical hypothesis testing*.  I am *not* going to argue
here that a Bayesian approach is better.  Instead, I am just going to
clear up some common misconceptions about what confidence intervals

> Notice that the percentage shown aside the top play is the 
> "confidence"we have in it being better than the 2nd best play.

This is not correct.  It is an extremely common misconception.  The 
percentage is the probability that we would see the results that we in 
fact see (or even more skewed results), *under the assumption that the 
plays are equal*.  This is *not* the same as the the *confidence we have 
that the first play is better than the second play*.

I will state this again because it is so counterintuitive.  We would like 
to think that "5%" is the probability of some event occurring in the real 
world.  But *it's not*.  5% is the probability that, in the strange and 
implausible world where *the two plays are equal*, something as skewed as 
what we see (or something even more skewed) would occur.  It is tempting, 
*but wrong*, to twist this statement around into something like, "There is 
a 5% probability that the lower-ranked play is better."  THIS IS WRONG.

Given that it's wrong to say this in the case of just two plays, it 
follows that describing the multivariate tail probability as "the 
probability that the third-ranked play is the best" (in the case of more 
than two plays) *is also wrong*, for the same reason.

I strongly believe that GNU Backgammon should not say things that are just 
plain wrong, and should not perpetuate common statistical misconceptions.

Now, I happen to believe that percentages are more intuitive than j.s.d. 
numbers, and I am in favor of reporting things as percentages rather than 
as j.s.d. numbers.  However, the percentages should *not* be incorrectly 
described as "probabilities that this play is the best."

*If* one insists on having GNU Backgammon issue claims of the form, "the 
probability that this play is the best is X%," *then* one should adopt a 
Bayesian standpoint.  But I promised to speak strictly from the point of 
view of classical hypothesis testing, so I will say simply that statements 
of the form "the probability that this play is the best is X%" are simply 
*impossible* from this viewpoint.  The multivariate tail probability, for 
example, tells you only the probability that some strange event will occur 
*under the assumption that the equities are equal to the estimated 
equities*.  This is *not* the same as *the probability that the true 
equities are different from their estimated values*.

If you don't believe that what I am saying here is as clearcut as I am 
claiming it is, then check with a statistician.  And when I say 
"statistician," I don't just mean a scientist who uses statistics on a 
regular basis.  I recently learned of a study where 70 academic 
psychologists were quizzed on what confidence intervals meant, and only
3 out of the 70 got it right.  (Oakes, Statistical inference: A 
commentary for the social and behavioral sciences, Wiley, 1986.)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]