|Subject:||[Bug-gnubg] A new variance reduction technique|
|Date:||Sun, 29 Mar 2009 15:50:15 -0400|
This was also posted to the forums at bgonline.org,
I have a method that I think will make backgammon rollouts more efficient in that the std. errors will be smaller for the same number of games rolled out. I don’t have the skills though to actually program it. Let me first give the basic idea of the theory with an example. To read more about this the key words would be 2 sample t-tests and matched pairs. Nearly all college level introductory stats textbooks would have a chapter on this.
Suppose you want to measure the difference Mh – Fh where Mh is the average height of married males and Fh is the average height of married females. One way to do this is to take a random sample of married males and find the sample average and deviation and also find a random sample of females and find the sample average and deviation. If the sample sizes are large enough and/or the population itself is close to normal then the sample averages each have an approximately normal distribution (with the true population average as its mean and deviation determined by its sample deviation). The difference is also then normal. This means our best guess for the true population difference is just the difference in the sample means. The two samples were taken independently and the variance of the sum or difference of two random variables is the sum of the two individual variances. If the s dev are equal this gives a s dev of the difference equal to SQRT(2) times the STD dev of a single population. This is essentially how rollouts are now done to compare play A to play B.
A second way to estimate the difference in heights it is to use matched pairs. Instead of choosing males randomly and also females randomly choose married couples randomly. For each couple find the difference (male height – female height). This gives us a single list of numbers. Its average will be our best guess for the true population difference and the deviation of this list can be used to find the std error for our guess. The std. error will be lower for this case as it takes advantage of the positive correlation between husband and wife heights. As a near extreme case imagine that a husband is nearly always very close to being 3 inches taller than his wife. Use of the matched pairs technique shows this fact quickly. The greater the positive correlation the greater benefit is obtained. I believe that the matched pairs technique can be used in rollouts and will give a smaller standard error for the same number of trials.
Suppose one wants to do a rollout to compare play A with play B. What would be the meaning of having a positive correlation here? It would mean that if game n of say 1296 trials has a certain amount of luck for play A it would have about that same amount of luck for game n of the rollout of play B as well. Using the same dice stream is a better than nothing but somewhat feeble way to do this as the games could quickly diverge.
A better way is to ensure that when play A gets its best roll that play B also does and vice versa. The evaluation feature can make this possible. Choose a roll for one of the play’s rollouts at random. But then instead of using that same roll or another randomly chosen roll for the other play’s rollout make it so that the ranks are the same. So suppose that the rollout of play A just got its 5th best (out of 36) roll. Determine the 5th best roll for the rollout of play B and use that roll. If there is a tie break it randomly. At the end of each trial look at result of A – result of B. This gives a single list of numbers. The average value gives your estimate for equity of A – equity of B and the s.dev of the list gives the std. error. Conventional variance reduction techniques such as Dahl’s algorithm, and using each each die roll once could still be used.
Pseudocode (or perhaps I should call it pseudo-pseudo code) on how to do this is at the end. I am not a programmer so I hope my thoughts are interpretable.
How to test the value of this once it is has been coded up? Well one could just look at the std error of the estimate of the difference between play A and B. But I think something like the following would be more convincing: Take several instances of play vs play decisions. RO (conventionally) with a huge number of trials so that the true difference in the win chances are known very accurately. Now using the same bot RO with just 36 trials the play A vs play B choice. Look at the discrepancy between the RO result and the true difference we have already found. Do this say 100 times. The std. deviation of those errors gives a measure of how accurate the 36 game ROs are. Now repeat using the same bot but with the above technique. Compare the std deviation of the error from these rollouts against those obtained from the conventional rollouts.
PSEUDO CODE to estimate the difference in win% between play A and play B.
A Trial is a certain game being rolled out. For example a rollout might have 1296 trials for each of two strands.
1) Generate a random number (COMMENT could be done in conjunction with the rotation idea for rollouts that are powers of 36)
If the current trial is unresolved for both strands: Then
2) Using 2 ply analysis find the rank out of the 36 possible dice rolls for that roll for the rollout in which play A was made initially. (COMMENT: Note that this is not computationally expensive since is needed anyways for Dahl V.R.)
3) Using two ply analysis find the roll which has the same rank for the strand in which play B was made initially. If there is a tie then break it randomly.
4) Using 2 ply analysis find and make the best plays for each.
5) Use Dahl VR and adjust the running count for each strand.
If only one strand is unresolved then,
2) Choose a die roll randomly.
3) Play the move for that strand using 2-ply
4) Use Dahl VR and adjust the running count for each strand.
If Both strands are not resolved Then Run ForEachRoll
If both strands are resolved then record the following in 1XN arrays where N = number of trials to run.
1) A = game result for strand A + Dahl VR adjustment
2) B = game result for strand B + Dahl VR adjustment
3) Diff = A - B
PROGRAM: Do full RO
Inputs: N = number of trials.
Set Counter = 1
Until Counter = N+1 do:
Run Procedure ForEachTrial
Increase N by 1
Use the array A to give estimate of win% and std error for how often play A wins.
Use the array B to give estimate of win% and std. error for how often play B wins.
Use the array Diff to give estimate of difference in win% and its std error
Express your personality in color! Preview and select themes for Hotmail®. See how.
|[Prev in Thread]||Current Thread||[Next in Thread]|