bug-gnubg
[Top][All Lists]

## Re: [Bug-gnubg] Re: Race database and bearoff DB

 From: Jim Segrave Subject: Re: [Bug-gnubg] Re: Race database and bearoff DB Date: Wed, 2 Oct 2002 16:37:50 +0200 User-agent: Mutt/1.4i

```On Tue 01 Oct 2002 (16:19 +0000), Joern Thyssen wrote:
> On Tue, Oct 01, 2002 at 05:20:15PM +0200, Jim Segrave wrote
>
> Basically it's 32 unsigned short ints for each position. The current
> one-sided bear-off database is for 15 chequers on the 1-6 points, a
> total of N(21,6)=54264.
>
> Each of the 32 integers represent the probability that we bear off in i
> moves (multiplied by 255).
>
>                                 #Pos     #Short ints      #bytes
> 15 chrs on p1-6 : N(21,6)       54264     1736448        3.4 MB
> 15 chrs on p1-7 : N(22,7)      170544     5457408        5.4 MB
> 15 chrs on p1-8 : N(23,8)      490314    15690048       31.3 MB
> 15 chrs on p1-9 : N(24,9)     1307504    41840128       83.7 MB
> 15 chrs on p1-10: N(25,10)    3268760   104600320      209.2 MB
> 15 chrs on p1-11: N(26,11)    7726160   247237120      494.5 MB
> 15 chrs on p1-12: N(27,12)   17383860   556283520     1112.6 MB
>
> We currently use 15/1-6 which easily fits into memory and can be
> referenced as:
>
> aaBearoff1 ( position, i )

I just tried a little crude perl script to see how many values are
unneeded in the database.

My plan was:

the probability of clearing in n rolls is 0, if the number of chequers
is more than 4*n. In these situations, you could omit the beginning of
the values for a position, since they are known to be 0.

the probability of having cleared after n rolls in a position is 1 if
(n * 3 >= pip count) and (n * 2) >= number of chequers since you'd
still clear everything even with repeated 21 rolls.

Both the above are cheap to evaluate, so if the database can easily
handle variable length entries, there's a lot of savings to be made:

For a 6 point rolloff database, this means that:

Positions done:                        54264
Entries with zero probability:        141014
Entries with  probability one:        904326
Entries with fractional  probability: 691108

That gives a database with only 39.8% as many entries as currently.

For the gross case of a 12 point DB, things are less promising:

Positions done:                        17383860
Entries with zero probability:         49319633
Entries with  probability one:         28973060
Entries with fractional  probability: 477990827

For a reduction to 85% of the original.

There's a question in my mind if having 32 entries/position is
actually sensible. For a 6 point DB, the longest bearoff is 30
rolls. But a realistic appraisal, even for 15 checkers on the 6 point,
is going to have 30 rolls as a huge outlier - it requires 29
consecutive 2/1 rolls, which I put as being (1/18)**29, or around
4*10**-30.

There is probably some much more realistic threshold after which the
probability of clearing all checkers from the 1 through 6 points is
effectivly 1, particularly given that the database can't give a
probability less than 10**-5 in a short.

Now, if I'd paid more attention in statistics classes years ago, I'd
be able to answer this myself, but I didn't and I can't.

Shouldn't we be able to say:

Assume the worst case bear-off is 15 chequers on point m (currently we
have m = 6). This requires 15*m pips, so to clear all the chequers in
r rolls, we need to average 15*m/r pips/roll. Dice rolling gives a
normal distribution and the average of a set of r dice rolls is also
normally distributed around 8.17. We should be able to come up with a
number of rolls r such that the probability that (total pips for r
rolls is .ge. 15*m) > 1-1/65536 (the limit of our resolution for a
unsigned short). Then for an m point rollout DB, there's no point
having more than that number of rolls per position. I'm willing to bet
that for a 6 point rollout DB, it's somewhere around 15 to 20 rolls,
not the 32 we have now.

A side note for the mathematically inclined (totally irrelevant to
gnubg):

I was interested in the N(i, j) in the above list. In fact I foolishly
misread it as being about the number of combinations of i things taken
j at a time and was going to send an email pointing out how wrong this
was, but then I realised that it had nothing to do with C(i,j) and
that the figures are right. I'd already written a little perl script
to work out the number of positions for various numbers of points,
which not surprisingly, matches the above. But the series has an
interesting property. If you list the elements of the series and the
multiplier to get the next element you get the following pleasing
pattern (p[i.j] is the number of distinct positions of i checkers over
j points (including some checkers possibly being borne off):

multiplier to get next term
p[15, 1]  =         1     16/1
p[15, 2]  =        16     17/2
p[15, 3]  =       136     18/3
p[15, 4]  =       816     19/4
p[15, 5]  =      3876     20/5
p[15, 6]  =     15504     21/6
p[15, 7]  =     54264     22/7
p[15, 8]  =    170544     23/8
p[15, 9]  =    490314     24/9
p[15, 10] =   1307504     25/10
p[15, 11] =   3268760     26/11
p[15, 12] =   7726160     27/12
p[15, 13] =  17383860     28/13
p[15, 14] =  37442160

I'm sure there's a reason for this, but I can't see it.

--