[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] An evalutaion of the pruning nets

From: Joseph Heled
Subject: Re: [Bug-gnubg] An evalutaion of the pruning nets
Date: Mon, 25 Oct 2004 08:13:30 +1300
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040616

Thanks for the detailed analysis. I did some tests when developing the pruning, but not in such scope. But I wonder, since 2,375 is not a big number, can you publish it in some simple text format (Say "positionID matchID dice" per line), bzip it and send it to me?
(Same for differing cube decisions, if any)

Thanks, Joseph

Jim Segrave wrote:
I decided to see how much of a change the pruning nets make in gnubg's
behaviour between the version just prior to the pruning net release to
the pruning version. I had available an archive of 297 matches
containing 2509 games. All of the games had been evaluated using
the 'Supremo' setting for the move filters. Unfortunately, a small
number - I'd guess 20 or 30, were evaluated with a different MET,
although I don't think this has had any large effect).

I wanted to do two things - 1) see how the evaluations differed between the two versions 2) see how often the two versions differed in their choice of the best move.
My reasoning was that a difference in evaluations affects cube
decisions but not necessarily how gnubg would play a position (the
effect on rollouts would be small unless it produces a drastic change
in cube handling), whereas a difference in choice of best move would
affect play, and hence rollouts.

The first step was to run a script which imported an analysis from an
old version, export it as a .mat file, import the .mat file and
analyse it again with pruning. I intend to re-run the entire old
version analysis to compare the time taken. The analysis using pruning
took 26 hours, I expect the old version may take some 72 or more

I then ran a Perl script over the pairs of .sgf files. For every move,
I extracted  gnubg's choice of the best move and the 2 ply cubeful
analysis of every position considered. I discarded any positions where
one version did a 2 ply analysis and the other did a 0 ply, and any
plays where the move was forced. I did not attempt to compare doubling
actions, since I felt that a measure of the differences in evaluation
would indicate how much the doubling behaviour would be expected to
change. In the end, I was comparing 250,124 moves and 569,592
evaluations. 3,776 forced moves had been discarded. For each
evaluation, I compared the win/wing/winbg/loseg/losebg/cubeful equity
values in the two .sgf files. I calculated the average and standard error
of each of these metrics and counted the number of times that the two
versions disagreed on the best move. (for what it's worth, the script
checked that the two files matched at every point to ensure I didn't
get out of sync while comparing them).

The results:
                              win     wing   winbg   loseg  losebg  cubeful
Average absolute difference 0.00123 0.00108 0.00009 0.00157 0.00020 0.00343 Std err 0.00619 0.00390 0.00051 0.00642 0.00403 0.02100
In 2,375 cases, the choice of best move differed, (0.94% of the time)
Of these 2,375 differences, I suspect that many of them represent
points where the outcome of the game is more or less predetermined and
the choice of move is irrelevant or where the evaluations of different
alternative moves are so close to each other that even tiny
differences result in drastic re-ordering.

Here's a breakdown of where the pruning versions best move was ranked
by the non-pruning version.

count  pruning non-pruning
        rank    rank
   2     1,     17
   3     1,     15
   3     1,     16
   4     1,     13
   5     1,     14
   6     1,     12
   7     1,     11
   8     1,     8
  14     1,     10
  20     1,     not ranked
  25     1,     7
  35     1,     6
  44     1,     5
  62     1,     9
  96     1,     4
 287     1,     3
1754     1,     2

The conclusion I draw from all of this is that the pruning version has
no significant effect on either the outcome of evaluations or gnubg's
playing style.

Given the enormous speed increase from the pruning nets, I am inclined
to agree with Joseph Heled that we should simply remove reduced

reply via email to

[Prev in Thread] Current Thread [Next in Thread]