gnugo-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnugo-devel] using MPI for gnugo.


From: Gunnar Farnebäck
Subject: Re: [gnugo-devel] using MPI for gnugo.
Date: Sun, 08 Mar 2009 15:10:14 +0100
User-agent: Mozilla-Thunderbird 2.0.0.19 (X11/20090103)

Daniel Bump wrote:
>> There's no particular need to do anything drastic with the existing
>> code. The only thing that matters in the short term for making GNU Go
>> stronger is to catch up with the Monte Carlo development, especially
>> make the Monte Carlo code scale better to larger boards. Key ideas to
>> look into there are RAVE (rapid action value estimation),
>> progressive widening, and improved move ordering.
>
> Improved move ordering may be mainly a tuning matter.
>
> What is the process for tuning the existing Monte Carlo code?

There are two different processes involved here. One is move ordering
for choosing the next unexplored move while building the search tree,
the second is the probability distribution when doing a Monte Carlo
simulation from a leaf node of the tree. The latter can be controlled
by tuning 3x3 patterns and this is fairly well described in
doc/montecarlo.texi. The code related to this is on lines 918-1516 of
montecarlo.c, most of the function mc_generate_random_move() in the
same file, and patterns/mkmcpat.c.

The former is not so well-structured. Let's take a guided tour through
the code. Starting point is the function uct_play_move(). It first
computes uct values for the child nodes (i.e. so far explored
moves). At line 1952 it starts looking for an unexplored move. The
comment is misleading since it doesn't just choose a random move but
there are some move ordering going on. First, if no move at all has
been explored yet, it asks mc_generate_random_move() for a
suggestion. The point of this is that certain reactive moves, such as
capturing a self atari by the opponent, tend to be highly valued in
the tuning of the probability distributions. Otherwise the ordering is
decided by the tree->move_ordering array. This array is managed by the
two functions init_move_ordering() and update_move_ordering(), which
keep track of a numerical score for each move.

The init function is called at the setup in uct_genmove() and sets the
move scores to "(int) (10 * potential_moves[pos]) - 1", i.e. it uses
the move values from the traditional move generator for initial move
ordering (also notice the "FIXME: Quick and dirty experiment"). This
is successively modified by calls to update_move_ordering() when
collecting statistics from completed games in uct_traverse_tree() such
that moves in the tree on the winning side increases their score by
one. This is currently not tunable but there's nothing stopping this
move ordering from being determined by patterns instead of what's done
now.

> How can we see what the code is doing?

Some information can be obtained by setting mc_debug = 1 on line 38 of
montecarlo.c (notice the FIXME) and/or enabling the call to
uct_dump_tree() on line 2187. But it's not really easy to visualize
what the Monte Carlo search is doing.

/Gunnar




reply via email to

[Prev in Thread] Current Thread [Next in Thread]