[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-gnubg] Is it time for Gnubg 0.15? Re-rolling the position database.
From: |
Ian Shaw |
Subject: |
[Bug-gnubg] Is it time for Gnubg 0.15? Re-rolling the position database. |
Date: |
Mon, 17 Jul 2006 14:46:38 +0100 |
Astonishingly, it's been about three years since version 0.14 of gnubg was
released. It has proved to be superior to JellyFish and at least the equal of
Snowie 4. Since then, BgBlitz has arrived as a serious opponent, and rumours of
Z-bot's approach persist. If it ever arrives, I'm sure it will be a strong
player.
I think we've rested on our laurels long enough, and it's about time we started
trying to improve the playing strength of our favourite bot.
I can think of several ways where might seek to make improvements:
A) Speed up the evaluation function so gnubg can search faster, and maybe
deeper.
B) Improve the evaluation function by changing the neural net inputs or hidden
nodes.
C) Retrain the existing net using a new set of training positions.
D) Retrain the existing net using newer rollouts of the current set of training
positions.
I'm keen to discuss A, B and C, but this post is going to focus on the last
method. If this broadens into a far-reaching discussion, I think it will help
to keep the themes separate.
Even if A or B prove to offer the biggest benefits, improving the training
database will be advantageous, so the work won't go to waste.
CURRENT TRAINING DATABASE
I will summarise the current state of play, as far as I understand it. Please
correct me if I'm wrong.
We have a large set of positions rolled out 1296 times at 0-ply. The positions
were rolled out using the 0.13 weights. This position database was then used by
Joseph Heled to train the neural network, leading to the version 0.14 weights
that we currently use.
The positions were chosen from the following sources:
Games recorded on FIBS
Positions generated by gnubg playing against itself
Positions were included in the database if the 0-ply evaluation disagrees with
the 2-ply evaluation, indicating that gnubg does not understand the position
well.
The position database is divided into the following three categories, and
subdivided into numbered files to enable the work to be shared:
Race 0000 - 0046: Contact has been broken; both players are simply trying to
race around the board and bear off as fast as possible.
Crashed 0000 - 0085: Contact positions where one side has crashed, with several
men on the first 2 or 3 points.
Grand-Pos 0000 - 0150: More crashed positions.
Doubles: The doubles database includes crashed positions which have a forced
move or no move (so there can not be a discrepancy between plies).
Contact 0000 - 0108: The general state of play where there is still contact but
the position is not crashed.
More information can be found on Joseph Heled's pages,
http://pages.quicksilver.net.nz/pepe/ngb/index-top.html.
RETRAINING THE EXISTING NET
We used gnubg 0.13 to generate the current database, giving us the training
data to produce version 0.14. I propose to update this database by re-rolling
it using version 0.14. This will give us data to enable us to produce version
0.15.
Since gnubg 0.14 is already very strong, I would expect only an small
improvement, at best, but I think it's an obvious place to start.
I need some HELP here.
1) Firstly, I need the 0.14 weights translated into a format that the rollout
programme "sagnubg" can understand. This is a text file of floating point
numbers, and is not in the same format as the gnubg.wd file. I have
sagnubg030101, which I assume is the latest version.
2) I don't have all the training database data. I've still got the ones I
rolled out, but there is a large amount missing. Hopefully Joseph can send me
the lot, but just in case, please could you send me any data you have if you
were part of the rollout team.
3) I don't know how to train the NN once the rollout is done. Joseph used his
own program external to gnubg. I've no idea how much work is involved at this
stage. Perhaps Joseph is willing to have another go, or teach me what to do.
4) Anyone who wants to help by rolling out positions is more than welcome.
Summer's here and people are going on holiday, leaving lots of PCs looking for
something to do. If you have a PC or two that will be idle for a while, why not
set it to work. If you do have more than one networked PC, I have some DOS
batch files that (crudely) co-ordinate the work among several PCs.
5) What order should these be attacked in? I propose to start with the Contact
positions. The Race net is already very strong, and I think Joseph struggled to
improve the Crashed net performance.
GNUBG'S ODD-EVEN EFFECT
It has been observed on numerous occasions that gnubg's even ply evaluations
agree with each other more than they agree with the interleaved odd-ply
evaluations. That is, 0- and 2-ply tend to agree with each other, as do 1- and
3-ply.
This is caused by the evaluation function always looking from the point of view
of player about to play. At even plies, it tries to maximize the player's
equity, whilst at odd plies it tries to maximize the opponent's equity - thus
minimizing the equity of the original player. Since gnubg tries to maximise the
equity at each ply, it will tend to pick moves that are overvalued at that
depth, leading to the swings we see between odd and even plies.
I have an idea that might mitigate this tendency. I wonder if it would be
beneficial to invert all the positions and equities in the rollouts. This would
give us the rollout data for each complementary position. We would effectively
double the size of the rollout database for almost no effort.
I can think of two potential drawbacks.
1) It would increase the training time. Is training time linearly proportional
to database size, or some exponential function such as the square of the
database size?
2) We would have the same data twice, presented in different formats. This
might encourage the NN to train to "fit" the data in the database, whereas we
are looking to generalize the evaluation function over the entire position
class.
Nis Jorgenson and Joseph Heled investigated the idea of combining odd and even
ply evaluations to produce a more accurate evaluation. The results were
positive, see
http://lists.gnu.org/archive/html/bug-gnubg/2003-02/msg00218.html, but they
were not incorporated into gnubg. I don't know why not, possibly due to the
overhead of combining information from two plies.
I'm wondering if my idea might have some the benefits of their idea in that it
considers both sides of a position, but does it at the training stage where it
is a one-off cost in processor power.
I'd be interested in all comments. I'd particularly like to get some help from
Øystein or Joseph to get me started - I go on holiday in two weeks and I'd like
to leave my PC busy.
Regards,
Ian Shaw
- [Bug-gnubg] Is it time for Gnubg 0.15? Re-rolling the position database.,
Ian Shaw <=
- Re: [Bug-gnubg] Is it time for Gnubg 0.15? Re-rolling the position database., Joseph Heled, 2006/07/17
- RE: [Bug-gnubg] Is it time for Gnubg 0.15? Re-rolling the position database., Ian Shaw, 2006/07/17
- Re: [Bug-gnubg] Is it time for Gnubg 0.15? Re-rolling the position database., Joseph Heled, 2006/07/17
- RE: [Bug-gnubg] Is it time for Gnubg 0.15? Re-rolling the positiondatabase., Albert Silver, 2006/07/17
- Re: [Bug-gnubg] Is it time for Gnubg 0.15? Re-rolling the positiondatabase., Joseph Heled, 2006/07/17
- RE: [Bug-gnubg] Is it time for Gnubg 0.15? Re-rolling the positiondatabase., Albert Silver, 2006/07/17
- Re: [Bug-gnubg] Is it time for Gnubg 0.15? New positions for training database, Ian Shaw, 2006/07/17
- Re: [Bug-gnubg] Is it time for Gnubg 0.15? New positions for training database, Achim Mueller, 2006/07/19
- RE: [Bug-gnubg] Is it time for Gnubg 0.15? New positions for trainingdatabase, Ian Shaw, 2006/07/20
Re: [Bug-gnubg] Is it time for Gnubg 0.15? Re-rolling the position database., Jonathan Kinsey, 2006/07/18