bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Confidence intervals from rollouts


From: Douglas Zare
Subject: Re: [Bug-gnubg] Confidence intervals from rollouts
Date: Tue, 10 Sep 2002 03:10:53 -0400
User-agent: Internet Messaging Program (IMP) 3.1

Quoting David Montgomery <address@hidden>:

> Douglas Zare wrote
> > There was an interesting question on the Gammonline bulletin board about
> the
> > standard deviation in cubeless rollouts and the standard deviations of
> live
> > cube results. I've included an excerpt below, in which I attempt to
> estimate a
> > confidence interval for the difference between doubling and not doubling.
> I'm
> > not sure what the best way to do this is, but I suggest that an attempt
> would
> > be worth implementing in gnu.
> 
> Isn't the right way to do this a paired t-test?
> This is what I thought, anyway, after talking with
> Jeremy Bagai about it.  Each game forms a pair with
> double and no-double result.  Since the two are so
> highly correlated, you should get a much tighter
> confidence interval than the joint standard deviation.

True (unless you use a nonlinear function to convert cubeless to cubeful 
numbers), although one problem is that the differences are not very close to a 
normal distribution, so you either have to wait for the Central Limit Theorem 
to kick in, or just hope for small tails, or use another test. I would be 
inclined to set a minimum number of trials and then the middle choice, but 
that's because I don't know the alternative tests.

> The same thing could be applied to checker plays, but
> it's not as clean, especially for 3+ plays.  Then you
> are probably supposed to do some multi-way analysis of
> variance, of which I am unfortunately ignorant.

Yeah, the Central Limit Theorem still works, but some coincidences that make it 
easy to do things in one dimension fail. I think most of the complexity goes 
away if you just allow yourself to use perhaps twice as much data as a truly 
efficient test would take. Since we are spending future processor cycles rather 
than analyzing last year's medical trials, I'm not so concerned about it.

Douglas Zare






reply via email to

[Prev in Thread] Current Thread [Next in Thread]