Re: [Bug-gnubg] Questions about gnubg-nn tools

Seems like you got a very very slightly better race net, but I would be surprised if it makes a difference in real life.

Would be much more interesting to
- get a better contact or crashed net
- expand the roll-out database for all categories (should be easy with the current availability of cycles)
- improve cube decisions (this is a hard one)
- improve back game evaluation and play (very hard one)

-Joseph

On 5 January 2012 12:40, Philippe Michel <address@hidden> wrote:

I have just tried to use the gnubg-nn tools to train nets and I have some questions for Joseph or other who may have some experience with these.

I started from the existing weights file (nngnubg.weights, 0.17-c5-123) and with the race net :

% ./train.py -v $DATA/training_data/race-train-data race
reading training data file

cycle 0 : checking
eqerr 0.00857, Max 0.11473 (0:01 210360)
creating race.save.86-115
cycle 1: training (20.000) 251862 positions in 0:02

and about a day later I was at 15000 cycles and stopped there :

cycle 15000 : checking
eqerr 0.00791, Max 0.11625 (0:01 207806)
cycle 15001: training (5.649) 251862 positions in 0:02

cycle 15001 : checking
eqerr 0.00785, Max 0.11597 (0:01 207807)
^Ccycle 15002: training (5.649)
Traceback (most recent call last):
File "./train.py", line 210, in <module>
trainer.train(alpha, order)
KeyboardInterrupt

First, how much is 15000 cycles ? There are counters in the weights file but at these values (15000 cycles * 251862 positions) 32 bit counters are close to wrapping around so I'm not sure if they are meaningful.

For some time, there were multiple intermediate weights files saved that apparently were best for various combinations of average and maximum error. But when I stopped there was only one. Is this especially favourable (a "dominant" net best for all cases) or not particularly significant because what's best will be determined by benchmark data, not training data ?

Is this a good practice to let the training run for a long time, or should it be stopped relatively frequently and restarted from the most promising intermediate weights ?

The benchmarks for the original weights and the result of the training were:

% ./perr.py -W $DATA/nets/nngnubg.weights $DATA/benchmarks/race.bm
14388 Non interesting, 96620 considered for moves.
0p errors 16847 of 96620 avg 0.000588801601517
n-out ( 488 ) 0.51%
7478 errors of 119067
cube errors interesting 5688 of 105573
me 3.72024388105e-05 eq 3.79352135977e-06
cube errors non interesting 1790 of 13494
me 0.000114145065754 eq 0.0

% ./perr.py -W ../train/race.save.73-100 $DATA/benchmarks/race.bm
14388 Non interesting, 96620 considered for moves.
0p errors 16483 of 96620 avg 0.000581237898433
n-out ( 633 ) 0.66%
7500 errors of 119067
cube errors interesting 5710 of 105573
me 3.58447342447e-05 eq 3.68966502016e-06
cube errors non interesting 1790 of 13494
me 8.47618947172e-05 eq 0.0

How should one interpret this ? It looks like the new weights made 4 less checker play error and 22 more cube errors out of about 100000 in each case but the cost of these errors was a few % lower in both cases. Is this right ? And are these differences possibly significant or rather just noise ?

What can be called a worthwhile improvement ? Joseph's page about the racing net mentions an error rate halved between an old and a new net but says it was an especially successful step. What were the improvements between generations of the contact net, for instance ?

_______________________________________________
Bug-gnubg mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/bug-gnubg

From:	Joseph Heled
Subject:	Re: [Bug-gnubg] Questions about gnubg-nn tools
Date:	Thu, 5 Jan 2012 14:20:55 +1300