bug-gnubg
[Top][All Lists]

## Re: [Bug-gnubg] Confidence intervals from rollouts

 From: David Montgomery Subject: Re: [Bug-gnubg] Confidence intervals from rollouts Date: Thu, 5 Sep 2002 08:42:46 -0700

```Nis Jorgensen wrote:
> David Montgomery wrote:
> > Consider a single position, for which we have 3 rollout
> > samples A, B, and C.  The idea of rotating the first ply
> > or two is that the variance of the *difference* between
> > two plays should be reduced, since one aspect of the
> > luck has been eliminated.
>
> The idea of rotating the first ply or two should be to reduce the
> difference between the "true" value of a position and the results of
> rollouts.

I agree with you.  I mispoke.

> The variance (or rather standard error) is just a _measure_ of how much we
> trust the result, and reducing the value is not a goal in itself.
>
> This is very important to stress, especially in cases like this, where we
> should expect the standard error to go _up_ even though the actual
> trustworthyness of the rollout should improve

I believe I completely missed this point until now.  Thanks.

Hmmm.... but perhaps I am still missing it.  Because now I
can't see how the standard error will go up; at least not
the "standard error" that I am thinking about.

Please (anyone) correct my wording in what follows.  I'm
not so good with the statistical lingo.

To keep things simple, let's say we are only interested in
money cubeless equity -- one number.  We are rolling a
position out with some fixed evaluator and parameter set.
If we were to do an infinite number of trials, the average
rollout result would converge to the true value we are seeking.
(I say this by way of definition -- not the necessarily
the true value in a backgammon theoretical sense, just
the expected value of the rollout with this particular
evaluator and parameter set.)

We do a rollout of some length -- let's say 1296 games.
This rollout produces an estimate of the true value, and
is distributed about the true value according to some
distribution.  Given that we are averaging a bunch of
other random values, the distribution is approximately
normal.  We don't know the true mean and variance of
this distribution, but we can estimate them based on
our sample.  The mean should be approximately the mean
of the rollout values, and the variance should be
approximately the variance of a single game, divided
by 1296 (or 1295? it doesn't matter a lot here).

I think the square root of this variance is called
the standard error, but I used to just call it the
standard deviation, which it is, of the whole rollout
result considered as a random variable.

Now, if a technique (like rotating the opening roll)
doesn't reduce this standard error, how can it be
that it is increasing our confidence in the rollout?
If the rollout is more tightly bound to the true value
because of our technique, then the true standard error
must decrease.

Perhaps you just mean that our estimate of the standard
error based on the single game variance and the number
of games will be higher.  I can see that this is true.
And this is what Doug was alluding to in the message
that started this thread -- the standard error estimates
will be biased high.

But I would say that the true standard error is
actually lower.  Maybe there is something else I
should call this.  What I mean is that, if we do our
1296 game rollout many times, we can actually gather
statistics directly on the spread of the rollouts.
And if our technique is variance reducing, the spread
will be less.  A quick thought experiment to demonstrate
this is consider 1296 game rollouts with and without
rotating the first two ply, truncated after two rolls.
The true standard error of this rollout is 0 when you
rotate the rolls.  You get the exact answer every time.

I certainly believe that you know all this.  I'm
going into this much detail so that you can correct
me.

> Also, I don't understand why you bring "different plays" into this. This
is
> of course relevant for the "duplicate dice" evaluation, but not for the
> rotation (for which I would reserve the word "stratification").

I mispoke.  I should have said "different rollouts".
That is, if we do two 1296 game rollouts of the
*same* position with the same parameters, we end up
with two values, A and B.  If our technique is
variance-reducing, then A and B should each be
closer to the true value than they would otherwise
be, on average.  This means that they will be closer
to each other, on average, than two rollouts which
did not use the technique.  This is what I attempted
to detect.

> > So, for example, we would
> > expect that the standard error of abs(A - B) would be
> > less than sqrt(2)*[true standard error of rollout of
> > that size of that position].
>
> I am not sure what you mean by the "standard error of abs(A-B)". I assume
> you just mean Abs(A-B)?

A and B are two rollouts of the same position.  A-B
is just another random quantity.  Since A and B are
drawn from the same distribution, their standard
deviation is the same.  The standard deviation of
A-B is then just sqrt(2)*[standard deviation of rollout].

Let me plug in some real numbers, which I suspect
will make what I mean clear, even if my words have
failed.  Consider a rollout of an opening position,
with single game standard deviation of 1.35.  The
standard deviation of 1296 game rollouts is about
.0375.  The standard deviation of the difference
between two independent 1296 game rollouts is about
.053.

If we apply a variance reducing technique, like
rotating the first two rolls, then the standard
deviation of a rollout will go down, and the
standard deviation of a difference in rollouts
will also go down.  This is what I attempted to
detect, and did not find in my JF sample.

> I think I will do some simple coin experiments, and
> see if this brings me any insights.

I look forward to a report.

I apologize for confusing the issue with my comments
on "different plays" and "duplicate dice".  I hadn't
thought about all this for a long time, and simply
because of how my program is coded, I associated
opening roll rotation with duplicate dice.

David

```