Astonishingly, it's been about three years since version 0.14 of gnubg was
released. It has proved to be superior to JellyFish and at least the equal of
Snowie 4. Since then, BgBlitz has arrived as a serious opponent, and rumours of
Z-bot's approach persist. If it ever arrives, I'm sure it will be a strong
player.
I think we've rested on our laurels long enough, and it's about time we started
trying to improve the playing strength of our favourite bot.
I can think of several ways where might seek to make improvements:
A) Speed up the evaluation function so gnubg can search faster, and maybe
deeper.
B) Improve the evaluation function by changing the neural net inputs or hidden
nodes.
C) Retrain the existing net using a new set of training positions.
D) Retrain the existing net using newer rollouts of the current set of training
positions.
I'm keen to discuss A, B and C, but this post is going to focus on the last
method. If this broadens into a far-reaching discussion, I think it will help
to keep the themes separate.
Even if A or B prove to offer the biggest benefits, improving the training
database will be advantageous, so the work won't go to waste.
CURRENT TRAINING DATABASE
I will summarise the current state of play, as far as I understand it. Please
correct me if I'm wrong.
We have a large set of positions rolled out 1296 times at 0-ply. The positions
were rolled out using the 0.13 weights. This position database was then used by
Joseph Heled to train the neural network, leading to the version 0.14 weights
that we currently use.
The positions were chosen from the following sources:
Games recorded on FIBS
Positions generated by gnubg playing against itself
Positions were included in the database if the 0-ply evaluation disagrees with
the 2-ply evaluation, indicating that gnubg does not understand the position
well.
The position database is divided into the following three categories, and
subdivided into numbered files to enable the work to be shared:
Race 0000 - 0046: Contact has been broken; both players are simply trying to
race around the board and bear off as fast as possible.
Crashed 0000 - 0085: Contact positions where one side has crashed, with several
men on the first 2 or 3 points.
Grand-Pos 0000 - 0150: More crashed positions.
Doubles: The doubles database includes crashed positions which have a forced
move or no move (so there can not be a discrepancy between plies).
Contact 0000 - 0108: The general state of play where there is still contact but
the position is not crashed.
More information can be found on Joseph Heled's pages,
http://pages.quicksilver.net.nz/pepe/ngb/index-top.html.
RETRAINING THE EXISTING NET
We used gnubg 0.13 to generate the current database, giving us the training
data to produce version 0.14. I propose to update this database by re-rolling
it using version 0.14. This will give us data to enable us to produce version
0.15.
Since gnubg 0.14 is already very strong, I would expect only an small
improvement, at best, but I think it's an obvious place to start.
I need some HELP here.
1) Firstly, I need the 0.14 weights translated into a format that the rollout programme
"sagnubg" can understand. This is a text file of floating point numbers, and is
not in the same format as the gnubg.wd file. I have sagnubg030101, which I assume is the
latest version.
2) I don't have all the training database data. I've still got the ones I
rolled out, but there is a large amount missing. Hopefully Joseph can send me
the lot, but just in case, please could you send me any data you have if you
were part of the rollout team.
3) I don't know how to train the NN once the rollout is done. Joseph used his
own program external to gnubg. I've no idea how much work is involved at this
stage. Perhaps Joseph is willing to have another go, or teach me what to do.
4) Anyone who wants to help by rolling out positions is more than welcome.
Summer's here and people are going on holiday, leaving lots of PCs looking for
something to do. If you have a PC or two that will be idle for a while, why not
set it to work. If you do have more than one networked PC, I have some DOS
batch files that (crudely) co-ordinate the work among several PCs.
5) What order should these be attacked in? I propose to start with the Contact
positions. The Race net is already very strong, and I think Joseph struggled to
improve the Crashed net performance.
GNUBG'S ODD-EVEN EFFECT
It has been observed on numerous occasions that gnubg's even ply evaluations
agree with each other more than they agree with the interleaved odd-ply
evaluations. That is, 0- and 2-ply tend to agree with each other, as do 1- and
3-ply.
This is caused by the evaluation function always looking from the point of view
of player about to play. At even plies, it tries to maximize the player's
equity, whilst at odd plies it tries to maximize the opponent's equity - thus
minimizing the equity of the original player. Since gnubg tries to maximise the
equity at each ply, it will tend to pick moves that are overvalued at that
depth, leading to the swings we see between odd and even plies.
I have an idea that might mitigate this tendency. I wonder if it would be
beneficial to invert all the positions and equities in the rollouts. This would
give us the rollout data for each complementary position. We would effectively
double the size of the rollout database for almost no effort.
I can think of two potential drawbacks.
1) It would increase the training time. Is training time linearly proportional
to database size, or some exponential function such as the square of the
database size?
2) We would have the same data twice, presented in different formats. This might
encourage the NN to train to "fit" the data in the database, whereas we are
looking to generalize the evaluation function over the entire position class.
Nis Jorgenson and Joseph Heled investigated the idea of combining odd and even
ply evaluations to produce a more accurate evaluation. The results were
positive, see
http://lists.gnu.org/archive/html/bug-gnubg/2003-02/msg00218.html, but they
were not incorporated into gnubg. I don't know why not, possibly due to the
overhead of combining information from two plies.
I'm wondering if my idea might have some the benefits of their idea in that it
considers both sides of a position, but does it at the training stage where it
is a one-off cost in processor power.
I'd be interested in all comments. I'd particularly like to get some help from
Øystein or Joseph to get me started - I go on holiday in two weeks and I'd like
to leave my PC busy.
Regards,
Ian Shaw
_______________________________________________
Bug-gnubg mailing list
address@hidden
http://lists.gnu.org/mailman/listinfo/bug-gnubg