bug-gnubg
[Top][All Lists]

## Re: [Bug-gnubg] Confidence intervals from rollouts

 From: Douglas Zare Subject: Re: [Bug-gnubg] Confidence intervals from rollouts Date: Tue, 10 Sep 2002 03:10:53 -0400 User-agent: Internet Messaging Program (IMP) 3.1

```Quoting David Montgomery <address@hidden>:

> Douglas Zare wrote
> > There was an interesting question on the Gammonline bulletin board about
> the
> > standard deviation in cubeless rollouts and the standard deviations of
> live
> > cube results. I've included an excerpt below, in which I attempt to
> estimate a
> > confidence interval for the difference between doubling and not doubling.
> I'm
> > not sure what the best way to do this is, but I suggest that an attempt
> would
> > be worth implementing in gnu.
>
> Isn't the right way to do this a paired t-test?
> This is what I thought, anyway, after talking with
> Jeremy Bagai about it.  Each game forms a pair with
> double and no-double result.  Since the two are so
> highly correlated, you should get a much tighter
> confidence interval than the joint standard deviation.

True (unless you use a nonlinear function to convert cubeless to cubeful
numbers), although one problem is that the differences are not very close to a
normal distribution, so you either have to wait for the Central Limit Theorem
to kick in, or just hope for small tails, or use another test. I would be
inclined to set a minimum number of trials and then the middle choice, but
that's because I don't know the alternative tests.

> The same thing could be applied to checker plays, but
> it's not as clean, especially for 3+ plays.  Then you
> are probably supposed to do some multi-way analysis of
> variance, of which I am unfortunately ignorant.

Yeah, the Central Limit Theorem still works, but some coincidences that make it
easy to do things in one dimension fail. I think most of the complexity goes
away if you just allow yourself to use perhaps twice as much data as a truly
efficient test would take. Since we are spending future processor cycles rather
than analyzing last year's medical trials, I'm not so concerned about it.

Douglas Zare

```