|Subject:||Re: [Bug-gnubg] GNU Backgammon overview/background|
|Date:||Tue, 11 Nov 2003 15:21:53 +1300|
|User-agent:||Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031007|
Øystein Johansen wrote:
The race net is not trained with TD at all! (or maybe it was back in last century, but it played horrible). Just think about it. When there's no contact, taking the opponents next roll move and is quite independent of the players move. The weights will therefore not converge to any values at all. Same thing when it comes to evaluation of race position. 1-ply evaluation of a race is just a waste of time, since the opponents roll and move won't affect your best move. (..well, maybe some small point...)The race net is therefore trained against the OSR evaluator. OSR is the One Side Rollout algorithm. It simply rolls out the position at one side using heuristic move rules. For each game it rolls out it keeps track of how many rolls it used to get off. Then it gets a roll distribution for the position. The same is done for the other side, and the winning probability is calculated is the same way as the 1 sided bearoff database. This OSR algorithm is used to train the race network.
Almost all of the above is not true. The race net is trained exactly in the same way as the other nets. The OSR method can serve as a stepping stone step only. I was surprised (and slightly dismayed) when it turned out OSR does a not-so-good job at checkers play (based on my rollout benchmark). The reason seems to be that the OSR heuristic misses many correct plays. While each error is small, the accumulative effect can be big enough to choose the weaker of two close plays, which in race situations is most of the time.
|[Prev in Thread]||Current Thread||[Next in Thread]|