bug-gnubg
[Top][All Lists]

## Re: [Bug-gnubg] Confidence intervals from rollouts

 From: Douglas Zare Subject: Re: [Bug-gnubg] Confidence intervals from rollouts Date: Thu, 5 Sep 2002 13:36:57 -0400 User-agent: Internet Messaging Program (IMP) 3.1

```Quoting David Montgomery <address@hidden>:

> Nis Jorgensen wrote:
> > David Montgomery wrote:
> > > Consider a single position, for which we have 3 rollout
> > > samples A, B, and C.  The idea of rotating the first ply
> > > or two is that the variance of the *difference* between
> > > two plays should be reduced, since one aspect of the
> > > luck has been eliminated.
> >
> > The idea of rotating the first ply or two should be to reduce the
> > difference between the "true" value of a position and the results of
> > rollouts.
>
> I agree with you.  I mispoke.

It would be good to have a rollout whose point is to determine how much better
doubling is than redoubling, rather than to determine the absolute equities of
each. That's what one does by hand if one plays the position as a prop with two
cubes, though of course the checker play can depend on the cube position. One
doesn't necessarily get an estimate of the absolute equity. For example, one
can call it a wash when the smaller cube is redoubled and accepted. It also
makes sense to have rollouts whose purpose is to determine the difference
between plays, without determining the equity of either one.

> > The variance (or rather standard error) is just a _measure_ of how much we
> > trust the result, and reducing the value is not a goal in itself.

Ok, I think it would be useful to clarify the definitions we are encountering
here. I see at least three variances. First, there is an actual variance of the
estimated equity of a rollout scheme. Second, and we can estimate this by the
variance of a variable uniformly distributed among the trials of a single
rollout, with or without the n/(n-1) adjustment. Third, we can talk about the
average value of the latter quantity through some rollout scheme, as opposed to
the actual output of the estimate in one rollout. (It's possible that another
measure of the second quantity makes more sense, like the average of the
squareroots.)

I'm going to call these the real variance, the observed variance, and the ideal
sample variance. Please feel free to override these with better names; I
research probability (among other things) but I'm not very familiar with
statistics, and certainly not statistical conventions.

> > This is very important to stress, especially in cases like this, where we
> > should expect the standard error to go _up_ even though the actual
> > trustworthyness of the rollout should improve
>
> I believe I completely missed this point until now.  Thanks.
>
> Hmmm.... but perhaps I am still missing it.  Because now I
> can't see how the standard error will go up; at least not
> the "standard error" that I am thinking about.

By this change of the rollout scheme, the real variance should go down, but the
ideal sample variance will increase. I think one can view those as separate
effects, and that's why there might be disagreement about whether the variance
increases or decreases. This just states in a different way a lot of stuff I
snipped.

> But I would say that the true standard error is
> actually lower.  Maybe there is something else I
> should call this.  What I mean is that, if we do our
> 1296 game rollout many times, we can actually gather
> statistics directly on the spread of the rollouts.
> And if our technique is variance reducing, the spread
> will be less.  A quick thought experiment to demonstrate
> this is consider 1296 game rollouts with and without
> rotating the first two ply, truncated after two rolls.
> The true standard error of this rollout is 0 when you
> rotate the rolls.  You get the exact answer every time.

Equivalently, one could imagine that the variance reduction works perfectly
after the first two rolls. Anyway, this is how I've been trying to test the
effectiveness of stratification for my next GammonVillage column, which I plan
to be an introduction to the normal distribution.

Let's suppose that the luck estimates are perfect after two rolls, but perhaps
imperfect in the first two rolls. Then one level of stratification will have
the effect of making the luck estimates perfect on the first roll, and the
errors will just be from the second roll.

If the estimates on the first roll were already good, but the luck estimates on
the second roll were terrible, then we have not gained much. If inaccurate luck
estimates on the first roll were the problem, then we would gain a lot.

> > Also, I don't understand why you bring "different plays" into this. This
> is
> > of course relevant for the "duplicate dice" evaluation, but not for the
> > rotation (for which I would reserve the word "stratification").

There is a similar issue in that the real variance of the difference between
plays A and B may decrease. In this case the ideal sample variance also
decreases, in most positions. However, the extent of the decrease is unclear.
If you are deciding whether to leave a lethal shot 7 or 8 away, or 4 vs. 5
away, then giving the same dice may increase both variances. Making the dice
correlated will not change either variance for the estimates of the equities of
the individual plays, of course.

Douglas Zare

```