Re: [Bug-gnubg] A new variance reduction technique
From:
Massimiliano Maini
Subject:
Re: [Bug-gnubg] A new variance reduction technique
Date:
Mon, 30 Mar 2009 16:14:08 +0200
<This reply too was also posted to
bgonline.org>
Hi Bob,
just to be sure I do understand exactly
what you are proposing: your trick tries to
give comparable luck to the two plays'
games. But if Dahl's Var.Red. is perfect (i.e.
it factors out luck perfectly), this
would be unnecessary, right ?
A few comments (assuming my understanding
is right):
1. Instead of going for the same rank,
you could (at negliectible cost) select
the roll for play B that gives the closest
luck to the one obtained for play A.
Should be more efficent than just taking
the roll with the same rank.
2. I'm not sure gnubg uses 2ply for
luck evaluation. If it uses 0 ply (as I remember,
but I could be wrong, mabe it is 0ply
just for match analysis luck evals), introducing
your trick would cost something in terms
of speed (actually, quite a lot I would say,
knowing that each additional ply cost
more or less a factor 21). If somebody on gnubg
mailing list can confirm that ....
MaX.
address@hidden
wrote on 29/03/2009 21:50:15:
> > This was also posted to the forums at
bgonline.org, > > Bob Koca > > --- > > I have a method that I think will make backgammon
rollouts more
> efficient in that the std. errors will be smaller for the same
> number of games rolled out. I don’t have the skills though
to
> actually program it. Let me first give the basic idea of the theory
> with an example. To read more about this the key words would
be 2
> sample t-tests and matched pairs. Nearly all college level
> introductory stats textbooks would have a chapter on this. > > Suppose you want to measure the difference
Mh – Fh where Mh is
> the average height of married males and Fh is the average height
of
> married females. One way to do this is to take a random sample
of
> married males and find the sample average and deviation and also
> find a random sample of females and find the sample average and
> deviation. If the sample sizes are large enough and/or the
> population itself is close to normal then the sample averages each
> have an approximately normal distribution (with the true population
> average as its mean and deviation determined by its sample
> deviation). The difference is also then normal. This means our best
> guess for the true population difference is just the difference in
> the sample means. The two samples were taken independently and
the
> variance of the sum or difference of two random variables is the sum
> of the two individual variances. If the s dev are equal this gives
a
> s dev of the difference equal to SQRT(2) times the STD dev of a
> single population. This is essentially how rollouts are now
done to
> compare play A to play B. > > A second way to estimate the difference
in heights it is to use
> matched pairs. Instead of choosing males randomly and also females
> randomly choose married couples randomly. For each couple find the
> difference (male height – female height). This gives us a single
> list of numbers. Its average will be our best guess for the true
> population difference and the deviation of this list can be used to
> find the std error for our guess. The std. error will be lower
for
> this case as it takes advantage of the positive correlation between
> husband and wife heights. As a near extreme case imagine that
a
> husband is nearly always very close to being 3 inches taller than
> his wife. Use of the matched pairs technique shows this fact
> quickly. The greater the positive correlation the greater benefit
is
> obtained. I believe that the matched pairs technique can be used in
> rollouts and will give a smaller standard error for the same number
> of trials. > > Suppose one wants to do a rollout
to compare play A with play B.
> What would be the meaning of having a positive correlation here? It
> would mean that if game n of say 1296 trials has a certain amount
of
> luck for play A it would have about that same amount of luck for
> game n of the rollout of play B as well. Using the same dice
stream
> is a better than nothing but somewhat feeble way to do this as the
> games could quickly diverge. > > A better way is to ensure
that when play A gets its best roll
> that play B also does and vice versa. The evaluation feature can
> make this possible. Choose a roll for one of the play’s rollouts
at
> random. But then instead of using that same roll or another randomly
> chosen roll for the other play’s rollout make it so that the ranks
> are the same. So suppose that the rollout of play A just got its 5th
> best (out of 36) roll. Determine the 5th best roll for the rollout
> of play B and use that roll. If there is a tie break it randomly.
At
> the end of each trial look at result of A – result of B. This
gives
> a single list of numbers. The average value gives your estimate for
> equity of A – equity of B and the s.dev of the list gives the std.
> error. Conventional variance reduction techniques such as Dahl’s
> algorithm, and using each each die roll once could still be used.
> > Pseudocode (or perhaps I
should call it pseudo-pseudo code) on
> how to do this is at the end. I am not a programmer so I hope my
> thoughts are interpretable. > > How to test the value of this once it is
has been coded up? Well
> one could just look at the std error of the estimate of the
> difference between play A and B. But I think something like the
> following would be more convincing: Take several instances of
play
> vs play decisions. RO (conventionally) with a huge number of trials
> so that the true difference in the win chances are known very
> accurately. Now using the same bot RO with just 36 trials the play
A
> vs play B choice. Look at the discrepancy between the RO result and
> the true difference we have already found. Do this say 100 times.
> The std. deviation of those errors gives a measure of how accurate
> the 36 game ROs are. Now repeat using the same bot but with
the
> above technique. Compare the std deviation of the error from
these
> rollouts against those obtained from the conventional rollouts. > > PSEUDO CODE to estimate the difference
in win% between play A and play B. > DEFNS: > A Strand is a rollout starting with a certain
play, here either play
> A or play B. > A Trial is a certain game being rolled
out. For example a rollout
> might have 1296
trials for each of two strands. > > PROCEDURE: ForEachRoll: > > 1) Generate a random number
(COMMENT could be done in
> conjunction with the rotation idea for rollouts that are powers of
36) > If the current trial is unresolved for both strands:
Then > 2) Using 2 ply analysis find
the rank out of the 36 possible
> dice rolls for that roll for the rollout in which play A was made
> initially. (COMMENT: Note that this is not computationally
> expensive since is needed anyways for Dahl V.R.) > 3) Using two ply analysis
find the roll which has the same rank
> for the strand in which play B was made initially. If
there is a
> tie then break it randomly. > 4) Using 2 ply analysis find
and make the best plays for each. > 5) Use Dahl VR and adjust
the running count for each strand. > > If only one strand is unresolved then, > 2) Choose a die roll randomly. > 3) Play the move for that
strand using 2-ply > 4) Use Dahl VR and adjust
the running count for each strand. > > PROCEDURE ForEachTrial > > Run ForEachRoll > If Both strands are not resolved Then Run ForEachRoll > If both strands are resolved then record the
following in 1XN arrays
> where N = number of trials to run. > > 1) A = game result for strand
A + Dahl VR adjustment > 2) B = game result for strand
B + Dahl VR adjustment > 3) Diff = A - B > > > > PROGRAM: Do full RO > Inputs: N = number of trials. > > Set Counter = 1 > Until Counter = N+1 do:
> Run Procedure ForEachTrial > Increase N by 1 > Use the array A to give estimate of win% and
std error for how
> often play A wins. > Use the array B to give estimate of win% and
std. error for how
> often play B wins. > Use the array Diff to give estimate of difference
in win% and its std error