[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-gnubg] Re: more on bearoff databases
[Bug-gnubg] Re: more on bearoff databases
Sun, 27 Oct 2002 11:46:09 -0600
Embedded comments inside prefaced by (NK)
----- Original Message -----
From: "Joern Thyssen" <address@hidden>
To: "GNU Backgammon Bugs" <address@hidden>
Cc: "Neil Kaz" <address@hidden>
Sent: Sunday, October 27, 2002 4:06 AM
Subject: more on bearoff databases
> Hi all,
> I'm in the process of generating one-sided bearoff databases for gnubg.
> The bearoff databases generated contains probabilities for bearing off
> and the probabilities for bearing more than one chequer off (used for
> calculating gammon probs).
> The main problem is the size. My latest version stores only non-zero
> elements. However, in order for easy retrieval of the probabilities I
> store an index at the head of the file:
> pos in file (4 bytes) + non zero probs (1 byte) + idx into aray(1 bytes)
> + non zero gammon probs (1 byte) + idx into gammon array(1 byte)
> so the size of the file is
> #positions * 8 bytes + number of non zero elements
> The results so far are:
> #pts #positions size (unc) size comp size comp bzip2 -9
> 6 54,264 6,945,792 1,467,518 959,264
> 7 170,544 21,829,632 4,949,046 3,156,522
> 8 490,314 62,760,192 15,264,056 9,643,342
> 9 1,307,504 167,360,512 44,096,390 27,775,598
> 10 3,268,870 418,401,280 118,913,118 73,642,861
> 11 7,726,160 988,948,480 in progress
> The column "size(unc)" is the uncompressed size: #positions * 2 * 32 * 2
> bytes". The "size comp" is the size when only non-zero elements are
> stored + the index. I tried to bzip -9 the file; the size of the
> resuting file can be seen in the last column.
> In general, storing only non-zero elements gives a saving of a factor of
> 4. Also, additional compression with bzip2 -9 gives at most an
> additional factor of 2, so it appears that the entropy of the resulting
> compressed files are very high. If I bzip -9 the uncompressed
> 418,401,280 10-pt file I get a file with size 65,771,118, that is, only
> slightly smaller than bzip'ing the comp. file.
> Extrapolating the table above we get:
> #pts #positions size(unc) estimated size comp
> 11 7,726,160 988,948,480 250 MB
> 12 17,383,860 2,225,134,080 550 MB
> 13 37,442,160 4,792,596,480 1200 MB
> 14 77,558,760 9,927,521,280 2500 MB
> So the practiacal limit appears to be the 12-pt or perhaps the 13-pt
> databases. So if we want to go further we have to do something else:
(NK) I'd stop at the 13 point since this is where so many BG position become
straight races. Even most midpoint to midpoint contact is essentially a
straight race, but we may wish to ignore the database for playing with any
midpoint contact. However, if there are a couple other checkers for both
sides in the outboard, the database evaluation of racing chances with
midpoint contact should be fine for cube evaluation purposes.
I'd stop the database at the 13 point (mid point) and add only the following
expansion which would continue the database further out with 1 or 2 checkers
only on points higher than the midpoint. This will be used only in non
contact and will not require that many more positions (or will it) (the two
checkers on higher points db can be stopped at the 18 point while the one
checker back db can likely be stopped at the 21 point and this will pick up
almost every race position that occurs in playing)
If we simply decide to stop at the 13 point that is fine !
> 1 eliminate "ridicolous" positions, e.g., 15 chequers on the 13 point.
> For example, we may opt to store only positions with fewer than, say,
> 10 chequers on each point. However, I did some simulations and in
> general we can't save anything here. For example, for the 13 pt
> database there are only 30k positions with more than 10 chequers on a
> point compared to the total of 37M positions. Similar for the other
> So this idea is DEAD!
> 2 approximate the roll distribution with something that requires
> less parameters, e.g., a normal distribution. Any comments on this?
> (NK) I don't see how a normal distribution can pick up on wastage effects
or tell GNU how to properly play during the bear in.
> 3 settle with the 13 point database and combine it with one-sided
> (NK) GNU would need to be better trained to bear around and into the home
board or the OSR won't be accurate.
> 4 think of another compression scheme.
> > (2) and (3) are probably the most promosing ones.
(NK) 5. Try training GNU further in bearin situations and long races and
check the results with the large db that is being created. It might be
possible that GNU's training can make it play and evaluated these long races
nearly prefectly without any large db.
> Joern Thyssen, PhD
> Vendsysselgade 3, 3., DK-9000 Aalborg, Denmark
> +45 9813 2791 (private) / +45 2818 0183 (mobile) / +45 9633 7036 (work)
> Note: new mobile number!