[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gnubg] more on bearoff databases

From: Joern Thyssen
Subject: [Bug-gnubg] more on bearoff databases
Date: Sun, 27 Oct 2002 10:06:58 +0000
User-agent: Mutt/1.4i

Hi all,

I'm in the process of generating one-sided bearoff databases for gnubg.
The bearoff databases generated contains probabilities for bearing off
and the probabilities for bearing more than one chequer off (used for
calculating gammon probs). 

The main problem is the size. My latest version stores only non-zero
elements. However, in order for easy retrieval of the probabilities I
store an index at the head of the file:

pos in file (4 bytes) + non zero probs (1 byte) + idx into aray(1 bytes)
+ non zero gammon probs (1 byte) + idx into gammon array(1 byte)

so the size of the file is

#positions * 8 bytes + number of non zero elements

The results so far are:

#pts    #positions      size (unc)     size comp      size comp bzip2 -9
 6          54,264         6,945,792       1,467,518       959,264
 7         170,544        21,829,632       4,949,046     3,156,522
 8         490,314        62,760,192      15,264,056     9,643,342
 9       1,307,504       167,360,512      44,096,390    27,775,598
10       3,268,870       418,401,280     118,913,118    73,642,861
11       7,726,160       988,948,480      in progress

The column "size(unc)" is the uncompressed size: #positions * 2 * 32 * 2
bytes". The "size comp" is the size when only non-zero elements are
stored + the index. I tried to bzip -9 the file; the size of the
resuting file can be seen in the last column. 

In general, storing only non-zero elements gives a saving of a factor of
4. Also, additional compression with bzip2 -9 gives at most an
additional factor of 2, so it appears that the entropy of the resulting
compressed files are very high. If I bzip -9 the uncompressed
418,401,280 10-pt file I get a file with size 65,771,118, that is, only
slightly smaller than bzip'ing the comp. file.

Extrapolating the table above we get:

#pts    #positions      size(unc)       estimated size comp
11        7,726,160       988,948,480    250 MB
12       17,383,860     2,225,134,080    550 MB
13       37,442,160     4,792,596,480   1200 MB
14       77,558,760     9,927,521,280   2500 MB

So the practiacal limit appears to be the 12-pt or perhaps the 13-pt
databases. So if we want to go further we have to do something else:

1 eliminate "ridicolous" positions, e.g., 15 chequers on the 13 point.
  For example, we may opt to store only positions with fewer than, say,
  10 chequers on each point. However, I did some simulations and in
  general we can't save anything here. For example, for the 13 pt
  database there are only 30k positions with more than 10 chequers on a
   point compared to the total of 37M positions. Similar for the other dbs.

  So this idea is DEAD!

2 approximate the roll distribution with something that requires 
  less parameters, e.g., a normal distribution. Any comments on this?

3 settle with the 13 point database and combine it with one-sided

4 think of another compression scheme. 

(2) and (3) are probably the most promosing ones.


Joern Thyssen, PhD
Vendsysselgade 3, 3., DK-9000 Aalborg, Denmark
+45 9813 2791 (private) / +45 2818 0183 (mobile) / +45 9633 7036 (work)
Note: new mobile number!

reply via email to

[Prev in Thread] Current Thread [Next in Thread]