bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: The status of gnubg?


From: Joseph Heled
Subject: Re: The status of gnubg?
Date: Tue, 20 Oct 2020 06:52:08 +1300



On Tue, 20 Oct 2020 at 06:36, Øystein Schønning-Johansen <oysteijo@gmail.com> wrote:
Hi,

A method that has been tried out goes something like this:

Step 1: Collect positions:
  • Let the computer play self play, in many games.
  • While playing, at each move, check if the 2-ply move selected move is the same as 0-ply selected move.
  • if move_0ply != move_2ply -> store both resulting positions from the 0-ply move and the 2-ply move in some datastore (typically just a file).
  • continue self play until you think you have collected enough positions. (What criteria that should be... boredom maybe?)
Step 2: Rollout.
  • All positions collected in step 1, are then rolled out such that the best possible evaluation is found.
Step 3: Supervised training.
  • All positions from the rollouts date above are then used for supervised training.

The new trained neural network you now got, is hopefully better than the one you had before you started this process? (However you MUST verify that in some way, and it is best if you have a verification method ready before you even start the training. If not you can verify that you have improved the network by having the new and the old network play against each other.)

And if you still think your neural network can be further improved, just start doing this again from Step 1.

OK. Some discussion:
The time consuming steps here are actually step 1 and step 2. Step 3, supervised training, is pretty fast with modern methods and hardware. Packages like Keras and PyTorch, (Chainer, Caffe, CNTK, Tensorflow or whatever)  that can utilize GPU and TPU can train neural networks in minutes (instead of weeks). I already have tools to convert Keras and PyTorch neural nets to GNU Backgammon neural nets. (and the other way). So that is good news. However, more good news: the first two steps are highly distributable. Say we just make a simple tools chain and we start up 10-20 computers (Or maybe Ian has a lot of spare computers ;-), I guess the modern self play can find 2-3 0-ply 2ply mismatches pr. second (I'm just guessing?) to collect positions as described in step 1. We (or anyone volunteering) can start each of our collection processes on the equipment we got. Then if the same volunteers can rollout the positions with another tool (in the same toolchain) doing step 2. I then think we can get something going.

So, please join me in this discussion: Can we organize for such collective effort? I can share some tools. Joseph? Do you have some input? How many positions do you think we need? Will anyone join?

I can comment on that: my experience from 20 years ago was that at some stage adding positions started to hurt the net performance. It is always a balancing act between getting the common/regular positions right and getting the edge cases right. I think that whatever you do you might want to start fresh and see how my "method" (as you outlined above) can be improved.

-Joseph

 

Thanks,
-Øystein

On Mon, Oct 19, 2020 at 4:01 PM Aaron Tikuisis <Aaron.Tikuisis@uottawa.ca> wrote:
I see, that's very interesting. I'll make sure not to use ctrl-g for skewed situations like this!
So the real problem is that it thinks that gammon chances are near 0 for a position like this, when in fact it is 25%:
 GNU Backgammon  Position ID: h+sPAQD3rQEAAA
                 Match ID   : EAEAAAAAAAAE
 +12-11-10--9--8--7-------6--5--4--3--2--1-+     O: gnubg
 |                  |   |    O  O  O  O  O | O   0 points
 |                  |   |    O     O  O  O | O   On roll
 |                  |   |             O  O |    
 |                  |   |             O    |    
 |                  |   |             O    |    
^|                  |BAR|                  |    
 |                7 |   |                  |    
 |                X |   |                  |    
 |                X |   |    X           X |    
 |                X |   |    X           X |    
 |    X           X |   | X  X           X |     0 points
 +13-14-15-16-17-18------19-20-21-22-23-24-+     X: aaron (Cube: 1)


I'm not an expert but I'd think the NN should be able to learn this better - why not just try to train it more?

Is gnubg currently able to keep a database of its own 0-ply blunders? (Like, every time it does an evaluation, compare the higher-ply result with the 0-ply result and if the 0-ply errs by a large enough threshhold, add the position to the database.) If not, do you think it would be worth implementing this?

Best regards, Aaron

From: Øystein Schønning-Johansen <oysteijo@gmail.com>
Sent: October 19, 2020 9:26 AM
To: Aaron Tikuisis <Aaron.Tikuisis@uottawa.ca>
Cc: Joseph Heled <jheled@gmail.com>; Philippe Michel <philippe.michel7@free.fr>; bug-gnubg@gnu.org <bug-gnubg@gnu.org>
Subject: Re: The status of gnubg?
 
Attention : courriel externe | external email
On Mon, Oct 19, 2020 at 3:10 PM Aaron Tikuisis <Aaron.Tikuisis@uottawa.ca> wrote:
That is interesting, I did not realize that gnubg misplays race positions much. What are some examples?

 Here is a position I posted a few weeks ago. 

GNU Backgammon  Position ID: 960BAMCw+0MAAA
                 Match ID   : cAkAAAAAAAAA
 +13-14-15-16-17-18------19-20-21-22-23-24-+     O: gnubg
 |                  |   |    O  O  O  O  O | O   0 points
 |                  |   |    O     O  O  O | O  
 |                  |   |             O  O |    
 |                  |   |             O    |    
 |                  |   |             O    |    
v|                  |BAR|                  |     (Cube: 1)
 |                7 |   |                  |    
 |                X |   |                  |    
 |                X |   | X                |    
 |                X |   | X  X           X |     On roll
 |    X           X |   | X  X           X |     0 points
 +12-11-10--9--8--7-------6--5--4--3--2--1-+     X: oystein  

Money game and X to play. Try several rolls, like 52, 31 and 53 and... at 0-ply. What's the best move? 52: 6/1 6/4?
Of course, the evaluator reports 0.0 win, but since the gammons are incorrectly evaluated by the neural network, it makes ridiculous moves.
It looks like this is a common pattern in positions which are "skewed".

-Øystein


reply via email to

[Prev in Thread] Current Thread [Next in Thread]