[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-gnubg] GNU Backgammon overview/background
Re: [Bug-gnubg] GNU Backgammon overview/background
Thu, 13 Nov 2003 03:50:50 +0100
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.2.1) Gecko/20021130
Thomas Hauk wrote:
About the training:
Temporal difference learning was only used in the first stages of the
program. With this kind of training the program reached a intermediate
playing strength. Later the nets have been trained with supervised
training with positions from a position database. The desired outputs of
the training positions in the database are based on rollouts. The latest
contact and crashed nets are trained by a slighly different technique.
It's like the net is trained to play like the 2-ply evaluation.
So... let me get this straight.
TD learning was employed to train a NN, which then produced a position
database along with what the NN thought to be the best move for that
position. Then a second NN was trained with only this resulting position
DB? In other words, the second NN is trained with expertly-labelled data,
where the expert is an older NN trained only with self-play?
(I must be terrible bad at explaining things, since it doesn't seem like
you got this correctly. I'm really sorry... Here's my new attemt)
TD learning was the first thing. I'm not sure what version number this
net had, but lets call this contact net 0.0.
contact net 0.0 wasn't the best player in the world, and the things
didn't become anything better by the fact that it played together with a
bad race net. Let's call the first race net 0.0. Together I estimate
they would have a ~1600 fibs rating.
One of the probles was that the nets didn't match each other either. It
refused to break contact when it was up in the race. And sometimes it
also refused to hold contact when it was down in the race. The race net
also refused to bear in chequers in race in some games, this was due to
the rece net gave a better GWC than the bearoff database. Nearly funny
to watch these games.
Joseph started his experiments. One on the first things he tried, was to
split the contact neural net to a contact net and a PBG(?) net for
backgames and priming games. This split didn't work very well. Joseph
had a hard time training these nets (Still using TD ?), and they where
terrible out of sync with each other and caused the same problem as
described in the paragraph above, except that the PBG net and contact
net was more out sync than the race and contact net, so it was causing
more trouble than it gained strenght. (Correct me if I'm wrong, Joseph.)
Joseph also found that some neural net inputs didn't contribute to the
learning, so he removed some inputs, and adjusted some others.
So, Joseph rejected the PBG net, and started supervised training
instead. The theory was simple enough: If a position has a cubeless
equity at 0-ply X ppg, and the same position has a cubeless equity of Y
ppg at 1-ply, then if abs(X-Y) > a threshold value, then it's reason to
belive that the 0.0 contact net does not understand this position well,
and this position and (position type) need more training. (The threshold
valuse was set to 0.1, I think). Net 0.0 was set to play against it
self, and for each position it reached, it calculated the 0-ply equity
and 1-ply equity (both cubeless of course, cube alorithms even wasn't
implemented yet). If the criteria of abs(X-Y)>threshold applied, then
the position was stored in a database. Just the position, nothing else.
After some number of games, the database had lots of positions, where
the net needed more training and the self play was stopped. Joseph may
know how many positions there are in this position database.
Now, for each position in the database, it was performed a cubeless
rollout, to estimate the 'real' ('real' = at least better) probabilities
and equity of the position. The rollouts was done with the same 0.0 net.
The rollout results was added to the database of positions. In this way
Joseph developed a database of positions and rollout results for each
So, this databased was used for training the next version of the neural
net. Normal backpropagation for each position in the database, with the
rollout result as the desired result. Running on through each position
in the position database and updating the weights with backpropagation
based on the rollout results, is called an epoch. Several epochs are run
over the database, starting with high alpha values (alpha is the
learning rate which is a foctor of hoe much the weights should be
adjusted), and decreasing as the weight values converges. The order of
the positions where also randomized for each epoch. The resulting neural
net was called 0.10, IIRC.
This net improved the strength of the program a lot, but still the race
net sucked. To avoid Joseph made a OSR based race evaluator, written i
c++ (you can still see the "race-c++" branch in the cvs, but this branch
was never merged into the main branch, but it was added to the fibs2html
tool), and used this with mgnutest. Joseph also developed a small neural
net for pruning moves for deeper evluations. This net had only 5 hidden
nodes, and was therefore very fast. The fast 5 hidden node net for
pruning combined with 2-ply search and the OSR race evaluator, reached
reached a FIBS rating of about 1850-1900.
Still something had to be done to the race net. I remember we discussed
how to solve this problem. I remember several schemes discussed. I
thought about a neural net with the chequer positions for one player as
the input, and the distribution of number rolls to get off as the
output, and someone alse had other ideas, and while we discussed this
problem at the list, Joseph suddenly released a new race network! I
remember I was partly shocked! Wow! With this new race net he used 14
input nodes for borne off checkers. One input for each checker checker
off. I remember I asked why, and he replied something like: "I don't
know, but it works!". This network that he had breaded was trained based
on OSR evaluations, And it was released with the contact net 0.10 and
together they where called 0.10 (I'm taking this from my memory now, so
the numbering may be wrong)
The first GamesGrid players GGraccoon and GGbeaver was based on these
two networks, the contact 0.10 net and the race 0.10 net. (David added
the reduced search algorithm which was used by GGbeaver, and this change
was implemented in the gnubg project. Actually there was a reduced
search algorithm before David's as well, but David's algorithm was
better and faster.)
In february 2001, Joseph makes some new changes to the evaluation
algorithm and and a new set of weights is released. He calls this net of
weights 0.15 in fibs2html, and the same weights are called 0.11 in
gnubg. I'm not sure of the differences of this net and the 0.10 net.
Joseph knows the details. A better cache system was added to the
evaluations, and a neural net speed up trick was also implemented)
Late 2001 (November/December) Joseph splits the contact network into a
crashed net and the ordinary net. I'm not sure how he trained these
net's but I assume he used supervised training from the position
database. This net was mergg into the GNU Backgammon project in January
2002. This was the really great update! It was amazing to watch mgnutest
play in fibs with this net. I really playd a master game. I showed that
it was able to find moves with long term plans, and I would already at
that stage say that it would outplay JellyFish in most positions and
play equal Snowie 3.2 or better in most positions. (It would of course
outplay Snowie in bear in situations, but we all know that Snowie suched
at bearing in). The new neural net trio formed was numbered 0.12. I'm
not sure about the training method, but I assume it was based on the
database of positions and rollouts. (Maybe the rollouts was rerun or
something... Joseph knows.)
The strange thing with the 0.12 nets, was that even though it was a
class for crashed position on it's own, specially trained on crashed
positions, it really played crashed positions badly? The gain was really
the contact net and not the crashed net! It's possible that the contact
net got better, since it now didn't have to learn about crashed
positions. Some of the brain capacity was freed to learn the contact
positions better. The out-of-sync problem between the crashed and
contact net and it wasn't a big problem either, since a game very seldom
goes from contact to crashed and then back to contact again.
Joseph runned some analysis of the occurence of each position class from
mgnutest fibs playing, and he reported that 79% of all evaluations where
contact evaluations, 8% was crashed evaluations, 7% was race and 6% was
Joseph then used some months to improve the crashed net. I guess he used
the same technic as usual described on his page. Database of positons
and rollout results traind over several epochs. and we had nets called
0.12a and 0.12b, and the crashed net slowly improved.
Now the new problem arise: It's not a problem to bread new neural nets,
but how can you say that one net is better than an other one? Two good
nets needs millions of games to give a significant anser of which is
best. For this purpose Joseph started the gnubg training project.
Voluenteers over the internet, specially Ian Shaw, had big files of
position collected from online games, and rolled them out in big scale.
(We had a own tool for rolling the positions out called sagnubg.) The
rollout results was used as reference positions or benchmark, and the
net who played best according to all the rollout results in the
reference positions, was considered the best neural net. I guess we
rolled out about half a million positions (?).
Now Joseph started to use a slightly other technique for training. He
wanted to let the neural net learn "consepts". He let the computer play
games against it self or on fibs. He analysed moves instead of
positions. If a 2-ply evaluation chooses another move than a 0-ply
evaluation, there is a "concept" of the position it doesn't understand.
If 0-ply an 2-ply disagreed on a move, both resulting positions was
added to the training database. Then these positions in the database was
rolled out, and a new series of supervised epoch training was performed.
If the new resulting net scored better against the reference database,
the net was considered better. In this way the static evaluations was
trained to play like the 2-ply evaluations. Note that the positions in
the reference database was never used for training, just for reference.
The training method in the above paragraph gave the some new versions of
0.12 nets, all the net was gradually improved, the race net, the crached
net and the contact net. these where named 0.12c, 0.12d and so on. In
December 2002 or January 2003 the 0.13 nets was released. It was real
world class! Really amazing! We started to roll out contact position for
reference position benchmarking as well, and just as the last benchmark
rollout was completed, Joseph released the 0.14 nets. This nets are
probably the best neural nets for backgammon playing in the universe
right now. (I can only think of mloner and zbot as real competitors,
since I strongly belive that the 0.14 net has a edge over Snowie4.)
(It's late I have to finnish this mail....)
If this mail sounds like a tribute and hail to Joseph and his amazing
work on the neural net, I must say it acctually _is_ a tribute. Without
his intelligence, his insight, his computer wizardry, his push and
go-ahead spirit, the GNU Backgammon project would still have been a poor
player with a fibs rating about 1600. The backgammon community all over
the world owes him a lot. THANK YOU, PEPE!
Thomas, I hope this gives you some of the story and some info on the
training techniques used in GNU Backgammon
Strongest in the World?
I actually feel more confident with the analysis done by Michael Depreli
at GammOnLine. Search net for his results. He's probably reading this
mail, so I guess he can mail you his methods and final results.
I hope he does. I don't subscribe to GammonLine so someone will have to
help me out here ;-)
Here is some data:
(Sorry the typos, it's late here in Norway)
Re: [Bug-gnubg] GNU Backgammon overview/background, Achim Mueller, 2003/11/11
- [Bug-gnubg] GNU Backgammon overview/background, Thomas Hauk, 2003/11/10
- Re: [Bug-gnubg] GNU Backgammon overview/background, Jim Segrave, 2003/11/11
- Re: [Bug-gnubg] GNU Backgammon overview/background, Thomas Hauk, 2003/11/12
- Re: [Bug-gnubg] GNU Backgammon overview/background, Øystein Johansen, 2003/11/12
- Re: [Bug-gnubg] GNU Backgammon overview/background, Jim Segrave, 2003/11/12
- Re: [Bug-gnubg] GNU Backgammon overview/background, Thomas Hauk, 2003/11/12
- RE: [Bug-gnubg] GNU Backgammon overview/background, David Montgomery, 2003/11/12
- Re: [Bug-gnubg] GNU Backgammon overview/background, Jim Segrave, 2003/11/13