[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gnubg] Empirical validation of rating prediction formula

From: kvandoel
Subject: [Bug-gnubg] Empirical validation of rating prediction formula
Date: Sun, 5 Oct 2003 21:32:22 +0200 (CEST)

As  a reminder:  a while  ago I  did some  simulations to  correlate ELO
rating  with checker  and cube  error rates,  resulting in  a predictive
formula for the rating of a player from a GNUBG error analysis.

The writeup is on http://www.cs.ubc.ca/~kvdoel/tmp/ratings.

The question was: does it work on human play?

In order  to determine  this I  have collected an  many match  sets from
players with know  ratings as possible.  While it would  be nice to have
more data I think the results validate the formula and speculations that
human error can't be modeled by noise be put to rest.

The  results are  in the  table  below.  Indicated  is the  name of  the
player, where matches were played,  the rating offset used, which is the
rating of GNUBG on this site,  number of matches analysed, and the GNUBG
0-ply estimation based on the bilinear fit to the error rates.

Player          Site         Rating offset   # matches  Actual rating   
Estimated rating
Albert Silver   FIBS         2050            129        1789            1852 +- 
Nardy           FIBS         2050            140        1769            1749 +- 
RJ Veldhuizen   FIBS         2050            346        1805            1819 +- 
csg             FIBS         2050            24         1577            1540 +- 
Holger          FIBS         2050            94         1750            1753 +- 
kvandoel        Gamesite2000 2200            261        1920            1876 +- 
quax            Gamesite2000 2200            101        1894            1891 +- 
slork           Gamesite2000 2200            33         1863            1918 +- 
cloots          GamesGrid    2000            76         1770            1758 +- 

An action item would now be to change the verbal playing level indicator
("beginner",  etc.) to  also  be based  on  the estimated  rating as  it
currently gives inconsisten results.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]