bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gnubg] Is it time for Gnubg 0.15? Re-rolling the position database.


From: Ian Shaw
Subject: [Bug-gnubg] Is it time for Gnubg 0.15? Re-rolling the position database.
Date: Mon, 17 Jul 2006 14:46:38 +0100

Astonishingly, it's been about three years since version 0.14 of gnubg was 
released. It has proved to be superior to JellyFish and at least the equal of 
Snowie 4. Since then, BgBlitz has arrived as a serious opponent, and rumours of 
Z-bot's approach persist. If it ever arrives, I'm sure it will be a strong 
player.

I think we've rested on our laurels long enough, and it's about time we started 
trying to improve the playing strength of our favourite bot.

I can think of several ways where might seek to make improvements:

A) Speed up the evaluation function so gnubg can search faster, and maybe 
deeper.
B) Improve the evaluation function by changing the neural net inputs or hidden 
nodes.
C) Retrain the existing net using a new set of training positions.
D) Retrain the existing net using newer rollouts of the current set of training 
positions.

I'm keen to discuss A, B and C, but this post is going to focus on the last 
method. If this broadens into a far-reaching discussion, I think it will help 
to keep the themes separate.

Even if A or B prove to offer the biggest benefits, improving the training 
database will be advantageous, so the work won't go to waste.

CURRENT TRAINING DATABASE

I will summarise the current state of play, as far as I understand it. Please 
correct me if I'm wrong.

We have a large set of positions rolled out 1296 times at 0-ply. The positions 
were rolled out using the 0.13 weights. This position database was then used by 
Joseph Heled to train the neural network, leading to the version 0.14 weights 
that we currently use.

The positions were chosen from the following sources:
Games recorded on FIBS
Positions generated by gnubg playing against itself

Positions were included in the database if the 0-ply evaluation disagrees with 
the 2-ply evaluation, indicating that gnubg does not understand the position 
well.

The position database is divided into the following three categories, and 
subdivided into numbered files to enable the work to be shared:

Race 0000 - 0046: Contact has been broken; both players are simply trying to 
race around the board and bear off as fast as possible. 

Crashed 0000 - 0085: Contact positions where one side has crashed, with several 
men on the first 2 or 3 points.
Grand-Pos 0000 - 0150: More crashed positions.
Doubles: The doubles database includes crashed positions which have a forced 
move or no move (so there can not be a discrepancy between plies).

Contact 0000 - 0108: The general state of play where there is still contact but 
the position is not crashed.

More information can be found on Joseph Heled's pages, 
http://pages.quicksilver.net.nz/pepe/ngb/index-top.html.

RETRAINING THE EXISTING NET

We used gnubg 0.13 to generate the current database, giving us the training 
data to produce version 0.14. I propose to update this database by re-rolling 
it using version 0.14. This will give us data to enable us to produce version 
0.15.

Since gnubg 0.14 is already very strong, I would expect only an small 
improvement, at best, but I think it's an obvious place to start.

I need some HELP here.

1) Firstly, I need the 0.14 weights translated into a format that the rollout 
programme "sagnubg" can understand. This is a text file of floating point 
numbers, and is not in the same format as the gnubg.wd file. I have 
sagnubg030101, which I assume is the latest version.
2) I don't have all the training database data. I've still got the ones I 
rolled out, but there is a large amount missing. Hopefully Joseph can send me 
the lot, but just in case, please could you send me any data you have if you 
were part of the rollout team.
3) I don't know how to train the NN once the rollout is done. Joseph used his 
own program external to gnubg. I've no idea how much work is involved at this 
stage. Perhaps Joseph is willing to have another go, or teach me what to do.
4) Anyone who wants to help by rolling out positions is more than welcome. 
Summer's here and people are going on holiday, leaving lots of PCs looking for 
something to do. If you have a PC or two that will be idle for a while, why not 
set it to work. If you do have more than one networked PC, I have some DOS 
batch files that (crudely) co-ordinate the work among several PCs.
5) What order should these be attacked in? I propose to start with the Contact 
positions. The Race net is already very strong, and I think Joseph struggled to 
improve the Crashed net performance.

GNUBG'S ODD-EVEN EFFECT

It has been observed on numerous occasions that gnubg's even ply evaluations 
agree with each other more than they agree with the interleaved odd-ply 
evaluations. That is, 0- and 2-ply tend to agree with each other, as do 1- and 
3-ply.

This is caused by the evaluation function always looking from the point of view 
of player about to play. At even plies, it tries to maximize the player's 
equity, whilst at odd plies it tries to maximize the opponent's equity - thus 
minimizing the equity of the original player. Since gnubg tries to maximise the 
equity at each ply, it will tend to pick moves that are overvalued at that 
depth, leading to the swings we see between odd and even plies.

I have an idea that might mitigate this tendency. I wonder if it would be 
beneficial to invert all the positions and equities in the rollouts. This would 
give us the rollout data for each complementary position. We would effectively 
double the size of the rollout database for almost no effort.

I can think of two potential drawbacks.

1) It would increase the training time. Is training time linearly proportional 
to database size, or some exponential function such as the square of the 
database size?
2) We would have the same data twice, presented in different formats. This 
might encourage the NN to train to "fit" the data in the database, whereas we 
are looking to generalize the evaluation function over the entire position 
class.

Nis Jorgenson and Joseph Heled investigated the idea of combining odd and even 
ply evaluations to produce a more accurate evaluation. The results were 
positive, see 
http://lists.gnu.org/archive/html/bug-gnubg/2003-02/msg00218.html, but they 
were not incorporated into gnubg. I don't know why not, possibly due to the 
overhead of combining information from two plies. 

I'm wondering if my idea might have some the benefits of their idea in that it 
considers both sides of a position, but does it at the training stage where it 
is a one-off cost in processor power.

I'd be interested in all comments. I'd particularly like to get some help from 
Øystein or Joseph to get me started - I go on holiday in two weeks and I'd like 
to leave my PC busy.

Regards,
Ian Shaw




reply via email to

[Prev in Thread] Current Thread [Next in Thread]