Re: [Bug-gnubg] GNU Backgammon overview/background

bug-gnubg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] GNU Backgammon overview/background

From:	Øystein Johansen
Subject:	Re: [Bug-gnubg] GNU Backgammon overview/background
Date:	Thu, 13 Nov 2003 03:50:50 +0100
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.2.1) Gecko/20021130

Thomas Hauk wrote:

About the training:
Temporal difference learning was only used in the first stages of theprogram. With this kind of training the program reached a intermediateplaying strength. Later the nets have been trained with supervisedtraining with positions from a position database. The desired outputs ofthe training positions in the database are based on rollouts. The latestcontact and crashed nets are trained by a slighly different technique.It's like the net is trained to play like the 2-ply evaluation.
So... let me get this straight.
TD learning was employed to train a NN, which then produced a positiondatabase along with what the NN thought to be the best move for thatposition. Then a second NN was trained with only this resulting positionDB? In other words, the second NN is trained with expertly-labelled data,where the expert is an older NN trained only with self-play?


Eehhh, no.....

(I must be terrible bad at explaining things, since it doesn't seem likeyou got this correctly. I'm really sorry... Here's my new attemt)


Long story.......

TD learning was the first thing. I'm not sure what version number thisnet had, but lets call this contact net 0.0.

contact net 0.0 wasn't the best player in the world, and the thingsdidn't become anything better by the fact that it played together with abad race net. Let's call the first race net 0.0. Together I estimatethey would have a ~1600 fibs rating.

One of the probles was that the nets didn't match each other either. Itrefused to break contact when it was up in the race. And sometimes italso refused to hold contact when it was down in the race. The race netalso refused to bear in chequers in race in some games, this was due tothe rece net gave a better GWC than the bearoff database. Nearly funnyto watch these games.

Joseph started his experiments. One on the first things he tried, was tosplit the contact neural net to a contact net and a PBG(?) net forbackgames and priming games. This split didn't work very well. Josephhad a hard time training these nets (Still using TD ?), and they whereterrible out of sync with each other and caused the same problem asdescribed in the paragraph above, except that the PBG net and contactnet was more out sync than the race and contact net, so it was causingmore trouble than it gained strenght. (Correct me if I'm wrong, Joseph.)

Joseph also found that some neural net inputs didn't contribute to thelearning, so he removed some inputs, and adjusted some others.

So, Joseph rejected the PBG net, and started supervised traininginstead. The theory was simple enough: If a position has a cubelessequity at 0-ply X ppg, and the same position has a cubeless equity of Yppg at 1-ply, then if abs(X-Y) > a threshold value, then it's reason tobelive that the 0.0 contact net does not understand this position well,and this position and (position type) need more training. (The thresholdvaluse was set to 0.1, I think). Net 0.0 was set to play against itself, and for each position it reached, it calculated the 0-ply equityand 1-ply equity (both cubeless of course, cube alorithms even wasn'timplemented yet). If the criteria of abs(X-Y)>threshold applied, thenthe position was stored in a database. Just the position, nothing else.After some number of games, the database had lots of positions, wherethe net needed more training and the self play was stopped. Joseph mayknow how many positions there are in this position database.

Now, for each position in the database, it was performed a cubelessrollout, to estimate the 'real' ('real' = at least better) probabilitiesand equity of the position. The rollouts was done with the same 0.0 net.The rollout results was added to the database of positions. In this wayJoseph developed a database of positions and rollout results for eachposition.

So, this databased was used for training the next version of the neuralnet. Normal backpropagation for each position in the database, with therollout result as the desired result. Running on through each positionin the position database and updating the weights with backpropagationbased on the rollout results, is called an epoch. Several epochs are runover the database, starting with high alpha values (alpha is thelearning rate which is a foctor of hoe much the weights should beadjusted), and decreasing as the weight values converges. The order ofthe positions where also randomized for each epoch. The resulting neuralnet was called 0.10, IIRC.

This net improved the strength of the program a lot, but still the racenet sucked. To avoid Joseph made a OSR based race evaluator, written ic++ (you can still see the "race-c++" branch in the cvs, but this branchwas never merged into the main branch, but it was added to the fibs2htmltool), and used this with mgnutest. Joseph also developed a small neuralnet for pruning moves for deeper evluations. This net had only 5 hiddennodes, and was therefore very fast. The fast 5 hidden node net forpruning combined with 2-ply search and the OSR race evaluator, reachedreached a FIBS rating of about 1850-1900.

Still something had to be done to the race net. I remember we discussedhow to solve this problem. I remember several schemes discussed. Ithought about a neural net with the chequer positions for one player asthe input, and the distribution of number rolls to get off as theoutput, and someone alse had other ideas, and while we discussed thisproblem at the list, Joseph suddenly released a new race network! Iremember I was partly shocked! Wow! With this new race net he used 14input nodes for borne off checkers. One input for each checker checkeroff. I remember I asked why, and he replied something like: "I don'tknow, but it works!". This network that he had breaded was trained basedon OSR evaluations, And it was released with the contact net 0.10 andtogether they where called 0.10 (I'm taking this from my memory now, sothe numbering may be wrong)

The first GamesGrid players GGraccoon and GGbeaver was based on thesetwo networks, the contact 0.10 net and the race 0.10 net. (David addedthe reduced search algorithm which was used by GGbeaver, and this changewas implemented in the gnubg project. Actually there was a reducedsearch algorithm before David's as well, but David's algorithm wasbetter and faster.)

In february 2001, Joseph makes some new changes to the evaluationalgorithm and and a new set of weights is released. He calls this net ofweights 0.15 in fibs2html, and the same weights are called 0.11 ingnubg. I'm not sure of the differences of this net and the 0.10 net.Joseph knows the details. A better cache system was added to theevaluations, and a neural net speed up trick was also implemented)

Late 2001 (November/December) Joseph splits the contact network into acrashed net and the ordinary net. I'm not sure how he trained thesenet's but I assume he used supervised training from the positiondatabase. This net was mergg into the GNU Backgammon project in January2002. This was the really great update! It was amazing to watch mgnutestplay in fibs with this net. I really playd a master game. I showed thatit was able to find moves with long term plans, and I would already atthat stage say that it would outplay JellyFish in most positions andplay equal Snowie 3.2 or better in most positions. (It would of courseoutplay Snowie in bear in situations, but we all know that Snowie suchedat bearing in). The new neural net trio formed was numbered 0.12. I'mnot sure about the training method, but I assume it was based on thedatabase of positions and rollouts. (Maybe the rollouts was rerun orsomething... Joseph knows.)

The strange thing with the 0.12 nets, was that even though it was aclass for crashed position on it's own, specially trained on crashedpositions, it really played crashed positions badly? The gain was reallythe contact net and not the crashed net! It's possible that the contactnet got better, since it now didn't have to learn about crashedpositions. Some of the brain capacity was freed to learn the contactpositions better. The out-of-sync problem between the crashed andcontact net and it wasn't a big problem either, since a game very seldomgoes from contact to crashed and then back to contact again.Joseph runned some analysis of the occurence of each position class frommgnutest fibs playing, and he reported that 79% of all evaluations wherecontact evaluations, 8% was crashed evaluations, 7% was race and 6% wasbearoff evaluations.

Joseph then used some months to improve the crashed net. I guess he usedthe same technic as usual described on his page. Database of positonsand rollout results traind over several epochs. and we had nets called0.12a and 0.12b, and the crashed net slowly improved.

Now the new problem arise: It's not a problem to bread new neural nets,but how can you say that one net is better than an other one? Two goodnets needs millions of games to give a significant anser of which isbest. For this purpose Joseph started the gnubg training project.Voluenteers over the internet, specially Ian Shaw, had big files ofposition collected from online games, and rolled them out in big scale.(We had a own tool for rolling the positions out called sagnubg.) Therollout results was used as reference positions or benchmark, and thenet who played best according to all the rollout results in thereference positions, was considered the best neural net. I guess werolled out about half a million positions (?).

Now Joseph started to use a slightly other technique for training. Hewanted to let the neural net learn "consepts". He let the computer playgames against it self or on fibs. He analysed moves instead ofpositions. If a 2-ply evaluation chooses another move than a 0-plyevaluation, there is a "concept" of the position it doesn't understand.If 0-ply an 2-ply disagreed on a move, both resulting positions wasadded to the training database. Then these positions in the database wasrolled out, and a new series of supervised epoch training was performed.If the new resulting net scored better against the reference database,the net was considered better. In this way the static evaluations wastrained to play like the 2-ply evaluations. Note that the positions inthe reference database was never used for training, just for reference.

The training method in the above paragraph gave the some new versions of0.12 nets, all the net was gradually improved, the race net, the crachednet and the contact net. these where named 0.12c, 0.12d and so on. InDecember 2002 or January 2003 the 0.13 nets was released. It was realworld class! Really amazing! We started to roll out contact position forreference position benchmarking as well, and just as the last benchmarkrollout was completed, Joseph released the 0.14 nets. This nets areprobably the best neural nets for backgammon playing in the universeright now. (I can only think of mloner and zbot as real competitors,since I strongly belive that the 0.14 net has a edge over Snowie4.)


(It's late I have to finnish this mail....)

If this mail sounds like a tribute and hail to Joseph and his amazingwork on the neural net, I must say it acctually _is_ a tribute. Withouthis intelligence, his insight, his computer wizardry, his push andgo-ahead spirit, the GNU Backgammon project would still have been a poorplayer with a fibs rating about 1600. The backgammon community all overthe world owes him a lot. THANK YOU, PEPE!

Thomas, I hope this gives you some of the story and some info on thetraining techniques used in GNU Backgammon

Strongest in the World?
I actually feel more confident with the analysis done by Michael Depreliat GammOnLine. Search net for his results. He's probably reading thismail, so I guess he can mail you his methods and final results.
I hope he does. I don't subscribe to GammonLine so someone will have tohelp me out here ;-)


Here is some data:
http://groups.google.com/groups?q=g:thl3357767960d&dq=&hl=en&lr=&ie=UTF-8&oe=utf-8&selm=f9846eb9.0306231441.3f641af4%40posting.google.com

-Øystein
(Sorry the typos, it's late here in Norway)

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-gnubg] GNU Backgammon overview/background, Thomas Hauk, 2003/11/10
- Re: [Bug-gnubg] GNU Backgammon overview/background, Øystein Johansen, 2003/11/10
  - Re: [Bug-gnubg] GNU Backgammon overview/background, Joseph Heled, 2003/11/10
  - Re: [Bug-gnubg] GNU Backgammon overview/background, Thomas Hauk, 2003/11/12
    - Re: [Bug-gnubg] GNU Backgammon overview/background, Øystein Johansen <=
    - Re: [Bug-gnubg] GNU Backgammon overview/background, Joseph Heled, 2003/11/16
- Re: [Bug-gnubg] GNU Backgammon overview/background, Jim Segrave, 2003/11/11
  - Re: [Bug-gnubg] GNU Backgammon overview/background, Thomas Hauk, 2003/11/12
    - Re: [Bug-gnubg] GNU Backgammon overview/background, Øystein Johansen, 2003/11/12
    - Re: [Bug-gnubg] GNU Backgammon overview/background, Jim Segrave, 2003/11/12
    - Re: [Bug-gnubg] GNU Backgammon overview/background, Thomas Hauk, 2003/11/12
    - RE: [Bug-gnubg] GNU Backgammon overview/background, David Montgomery, 2003/11/12
    - Re: [Bug-gnubg] GNU Backgammon overview/background, Jim Segrave, 2003/11/13
- Re: [Bug-gnubg] GNU Backgammon overview/background, Achim Mueller, 2003/11/11

Prev by Date: RE: [Bug-gnubg] GNU Backgammon overview/background
Next by Date: [Bug-gnubg] Question: Re "Panel build" - Annotation/Hint window parameters
Previous by thread: Re: [Bug-gnubg] GNU Backgammon overview/background
Next by thread: Re: [Bug-gnubg] GNU Backgammon overview/background
Index(es):
- Date
- Thread