[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] "Joseph-ID" in benchmark db

From: Mark Higgins
Subject: Re: [Bug-gnubg] "Joseph-ID" in benchmark db
Date: Sun, 12 Feb 2012 11:13:25 -0500

My best player (TD trained, race & contact networks, a couple extra inputs beyond the standard Tesauro ones) has an average error of 0.0164ppg/move in the contact set, so not surprisingly worse than GNUbg (I assume 1125 means 0.01125ppg/move?).

I also was curious which benchmark set was most relevant for predicting match score, since of course a real game is a mixture of the positions. I took a bunch of my players, of varying skills, and calculated the average error rate for the three benchmark sets; and also played each against PubEval for 40k cubeless money games. Then I regressed the score in those games against the benchmark ERs to see which was most important (using R^2 as a proxy for importance).

Turns out the contact benchmark is most relevant, followed by crashed. Race is not that important.

Details here:


On Feb 12, 2012, at 8:51 AM, Øystein Schønning-Johansen wrote:

I've looped through all 'm'-positionsThe following way:

For each postion I find if the best move with my evaluator, and find if my move is among the candidates in the list of moves. If it does not make the best move, I add the error to the total. If my evaluators move is not among the candidaes at all, I assign the same error as the worst move among the candidates. 

For all positions in contact.bm, GNU backgammon will have an error of about 1125, (IIRC)

Please report how your players make it.


2012/2/12 Mark Higgins <address@hidden>
Does anyone have the average error stats for 0-ply gnubg on the contact benchmarks?

I see race & crashed results at Joseph's page here:

but can't find the contact result anywhere. (Though I'd guess it's pretty close to the crashed error, ie around 0.01ppg/move.)

On Feb 10, 2012, at 6:26 AM, Øystein Schønning-Johansen wrote:

'r' is the seed used for the rollout, I think

Sound likely, since there is an 'r'-line for every rollout result. But there is no code lines in perr.py to conferm it. I guess we can trust your memory on that.

'o' is the cube rollout, and the numbers are the rollout values of the outcome probability,


reply via email to

[Prev in Thread] Current Thread [Next in Thread]