bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Re: Strange FIBS ratings


From: Christopher D. Yep
Subject: Re: [Bug-gnubg] Re: Strange FIBS ratings
Date: Wed, 10 Sep 2003 07:03:45 -0400

At 03:34 PM 9/9/2003 -0400, Douglas Zare wrote:
Quoting "Christopher D. Yep" <address@hidden>:

> I think this phenomenon has been known for many years now.  Kees'
> experiments and Douglas Zare's research on Gammonvillage are just the
> latest examples supporting this conclusion.

Some have known it, others have not. I have been arguing that

As a very rough guess, I'd say that 3% to 30% of all backgammon players know this fact today (that checker errors give up more equity than cube errors). Those who own gnubg (or Snowie) and regularly use the Player Records (or Account Manager) should know this fact, assuming they care enough about their stats to review them regularly. Of the 70%-97% who don't know the fact, some players grossly overestimate the importance of the cube. One casual player told me that "it's easy to move the checkers around, but that cube errors account for 98%-99% of the total equity lost"!

Humans have been wondering which is more costly (checker errors, cube errors) for a long time, even before the concept of EMG was invented.

There are two different questions,

(1) Which gives up more equity in ppg/mwc (points per game for a money game, match winning chances for a match), checker errors or cube errors?

(2a) Do players have higher checker error rates or higher cube error rates, with error rates measured using Snowie methodology?

(2b) Same as 2a, but using gnubg methodology?

#2b is significantly different than #2a. #2a uses the same denominator for both checker and cube error rates, so the ratio of (checker error rate) to (cube error rate) is the same as the ratio of (total EMG given up by checker errors) to (total EMG given up by cube errors). If I remember correctly, gnubg checker error rate is the total EMG given up (checker) divided by total number of unforced checker plays, while gnubg cube error rate is the total EMG given up (cube) divided by the number of (actual or "close" [based on some threshold] cube decisions).

I don't know the entire history of this thread (partly because it is spanned across multiple threads; I haven't read all the e-mails). #1 interests me much more so I haven't commented yet on #2b, but I'm guessing the thread was initially inspired by #2b.

The casual player doesn't have Snowie or GNU and is more concerned with #1.

1) Humans give up more equity through checker play.
2) Using EMG overstates the amount given up through cube play.

Many people have not been convinced (mainly weaker players), and I hope that my
column will convince them.

Question #1 has interested me since I started playing in the early 1990s. When I bought Snowie in 1999 I checked my own errors. I was surprised that my checker errors gave up much more equity than my cube errors, but I used the intuitive arguments I gave earlier (mainly that there are many more difficult checker decisions than difficult cube decisions each game) to convince myself of the fact. I also downloaded 9 analyzed matches (all in 1999, analysed by Snowie 3) from Oasya.com (now bgsnowie.com). I see that these matches have been taken down (except Ballard vs. Meyburg at the Nordic Open 1999), but they've put up 13 new ones in their place (http://bgsnowie.com/backgammon/matches.dhtml). If you have time, you may wish to review these. I'd guess that these matches are more reliable than those on Johanni's list, since presumably the decision to record/display each match was made before the actual match was played (I could be wrong though). Johanni's list includes only self-selected matches, which may present a bias. If there is a bias, I don't know in which direction it is, however I'll guess that Johanni's list is more likely to exclude matches with large cube errors; after a match a player may check a particular cube decision (but not many or any difficult checker problems), then if he was grossly wrong on the cube decision be too embarrassed to send in the match to Johanni. Additionally I think that Johanni's methodology is to rollout cube blunders but not checker blunders (someone correct me if I'm wrong). The last point is definitely a bias. A countering bias is that Snowie (at least Snowie 3) does not include checker errors in non-contact races but does include cube errors in non-contact races.

The second point is closer to what KvdD's experiments show. My data says that
human cube errors happen when less mwc is at stake, on average, than checker
play errors. His says that when gnu is told to play stupidly, its cube errors
happen when less mwc is at stake.

I thought that Kees' study centered around trying to estimate FIBS rating based on two variables (1) gnubg checker error rate, (2) gnubg cube error rate. This is more than just simply concluding that cube errors happen when less mwc (or ppg) is at stake. His overall conclusion is quite valuable in my opinion, but only if the results are trustworthy. The most important issue that needs further study is whether using (gnubg with noise) is sufficient to model humans. The advantage of using (gnubg with noise) is that we can quickly develop a huge sample size. I appreciate both bot and human work (with your investigation being the latter). Kees is now studying human data which is the next logical step. Hopefully work can continue in this area.

BTW, here are two intuitive arguments that cube errors happen when less ppg is at stake in a money game (similar results apply in matches with respect to mwc):

Suppose that a player's average (total cube errors in ppg) is X% of his (total checker errors in ppg).

1. If every game ended in double/pass then we can partition each game into periods based on the cube value:

1. Centered cube
2. 2-cube
3. 4-cube
4. 8-cube
Etc.

Each period ends when the cube is accepted (or passed in the case of the final cube). We have assumed that the player's average (total cube errors in ppg) is X% of his (total checker errors in ppg). It's reasonable to assume that this ratio applies across each period above (note that the final period ends with a double/pass). Thus a player's normalized cube error rate (Snowie methodology, not gnubg's methodology) will also be X% of his normalized checker error rate.

In actuality though, not every game ends in double/pass. For games that don't end in double/pass, the final period will involve difficult checker decisions, but not very many difficult cube decisions (if the game ended in double/pass though, then a representative number of difficult cube decisions could be expected in the final period). This is because the player holding the cube at the end of the game is usually an underdog throughout the final period, thus his cube decisions are easy (not always though; sometimes he has to decide whether he is too good or not, if he decides he is too good the cube will not be turned on that roll).

The overall effect of the above paragraph is that cube errors are more likely to be made on smaller cubes than checker errors.

2. The above does not consider that on large cubes (cubes >= 2), the player on roll has to (roughly) consider doubling only when he is both the favorite and when he owns the cube, while on very small cubes (centered cube) he has to consider doubling on *every* move when he is the favorite. This further amplifies the effect that cube errors are more likely to be made on smaller cubes.

Overall conclusion: the reported Snowie normalized error rate (equivalently: EMG error rate in the case of matches) exaggerates the effect of cube errors on total error in ppg. This agrees with your conclusion. There are some minor modelling flaws in #1, but these intuitive arguments were enough to convince myself when I first thought about it a few years ago.

Thanks for the work. While it sounds like you don't want to be mentioned in the same note as Kees, I think both your contributions are valuable and I hope this is taken as a compliment.

Chris





reply via email to

[Prev in Thread] Current Thread [Next in Thread]