bug-gnubg
[Top][All Lists]

## Re: [Bug-gnubg] Proposal - Luck rating

 From: Robert-Jan Veldhuizen Subject: Re: [Bug-gnubg] Proposal - Luck rating Date: Sun, 26 Jul 2009 23:46:12 +0200

Hello David, bit late perhaps, but I felt something was missing in this thread.

On Mon, Jul 6, 2009 at 10:10 PM, David Levy wrote:

Both players begin play at 50% to win a match. One of them wins, going to
100%, the other loses going to 0%. It seems to me that as a matter of
definition, the “net luck” plus the “net skill” (difference in MWC of the
two players’ errors) should be exactly 50%. Correct?

Only with a perfect neural net. As it is, GnuBG's error analysis is not perfect, and it's luck analysis is probably worse.

gnubg often reports results quite close to 50%. For example:

opponent                  Me

Error total EMG (MWC)              -0.6945 (-12.422%)      -1.3522
(-14.624%)
Luck total EMG (MWC)               +1.0533 (+35.787%)      -1.1830 (
-8.598%)

Net luck was 44% against me and net errors were another 2%. Close to 50%,
but not exact.

But sometimes the results are substantially different. Can anyone explain
this?

My thoughts:

(1) Perhaps gnubg uses 2-ply for checker play, but 0-ply for calculating
luck.

True, but not the real reason. You can analyse your match at 0-ply to test this.

It should be noted that measuring luck at 0-ply, is much like a 1-ply evaluation, also time-wise, as it compares the 0-ply equity after the actual roll (assuming the best move) with the average equity of all 21 different rolls (assuming best moves).

(2) Perhaps gnubg uses the bearoff database for checker play, but some-ply
analysis for luck and the two won't be 100% consistent.

Not that I know off.
(3) When you rollout a position, errors are updated. Luck is not.

Yes, this is because the two require different procedures to calculate. In this case, you'd need to rollout all the best moves for the 20 other different rolls possible tonupdate luck from a rollout.

(4) Amusingly, when an opponent resigns in a position that is not a 100%
loss, that is an error not reflected in the error total!

GnuBG still seems to have a bit of a problem with resigns, especially resigns when it's not the player's turn (also a problem in the game data structure, it seems), and resigns when a player has already rolled the dice.

(5) Other ideas?

Yes. Luck analysis is a different process than error analysis, With imperfect NN's, the two won't give the same answers overall. Estimating luck is harder for the bot in general, and therefore usually less accurate than estimating error sizes. However, the luck analysis has the great benefit of being unbiased, meaning that in the long run, on average, it will give the correct numbers, whereas the error analysis will always show the bias in a bot's neural net.

Bonus question. Can anyone come up with an EMG transformation that preserves
the idea that net luck and net skill expressed in EMG, exactly and
consistently replicate the match outcome (as it should, I contend, when
expressed in MWC)?

I think this is the case already in theory, except for that error and luck analysis are two different things, and both imperfect, so you can't get exact results with imperfect NN's.

Anyhow, I don't see that the EMG transformation is the culprit here.

The suggestion for reporting "net luck" is interesting as an addition. But I
would not like to lose the reporting of raw luck in MWC. You can have a +20%
net luck, but it FEELS very different as a player when that is +80% for you
and +60% for the opponent versus -40% for you and -60% for the opponent!

I strongly agree. The total luck rates for both players can actually be an interesting indicator for the type of match. forn instance you can have very interesting and tough matches when both players get very unlucky and spectacular wild swings matches when both players are very lucky.

So I'd like to see the two luck rates staying seperated, however, it would be convenient if GnuBG also provided the net luck (which is what's being used for the luck-adjusted result, BTW).

For the rest, I agree with Massimiliano that the luck rates expressed in EMG/move make very little sense, as I wrote in a different post already.

Greetings,

--
Robert-Jan Veldhuizen