I would like to download a copy of GNU backgammon to my iMac PC. I have tried downloading files as indicated on your site and other sites. I am not familiar with programming and special requirements like Quartz and I have not been able to correctly download the GNU program and get it up and running. I would greatly appreciate help in doing so and would be willing to compensate anyone who could help me accomplish this successfully.

Please contact me at melhathome@iCloud.com or preferably my cell (845) 453-2553.

Thanks,

Mel Handelsman --Apple-Mail=_B57C7102-695E-4BA6-8DDD-6C3D76F17093-- From MAILER-DAEMON Sun Nov 10 07:20:50 2019 Received: from list by lists.gnu.org with archive (Exim 4.90_1) id 1iTmD4-0002He-Pz for mharc-bug-gnubg@gnu.org; Sun, 10 Nov 2019 07:20:50 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:53927) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from

Hi team,

=
I am just finishing a project that has taken me many months (years), the cr=
eation of a new backgammon MET. My nearly finished MET is called 'PR2' and =
it is a combination of both rollouts and a theoretical MET. It uses rollout=
trials (+500k) of all match scores less than 9a9a and then a specially dev=
eloped theoretical MET I call the 'Variable MET' to extend the rollout resu=
lts to 31a31a. The Variable MET could generate all of the 9a9a MET probabil=
ities in its own right, however, I use it in my MET as an extrapolation too=
l. A lot of time and effort has gone into the accuracy of both the 9a9a rol=
louts and the development of the Variable MET, more so the latter.

I could go into a lot more detail if you wish, however, wh=
at I would like to do now is test my new PR2 MET and your help with 1) belo=
w is what I care most about. Tony Lezard (of Dueller renown) suggested I co=
ntact the Gnubg team after I asked him for help on testing.

<= /div>

1) What I would like to do is test my PR2 MET by doing a series o=
f 5pt matches where one Gnubg player uses the PR2 MET and the other the now=
standard Kazaross-XG2 MET (in particular). My 'PR2' player faces himself i=
n 500 by 5pt matches at a time and the results are recorded. The moves I do=
n't care about, who won that 500 match series is all that is important.&nbs=
p; I know that Gnubg can play itself now, however, not with different MET's=
loaded and not without a lot of human input (every game end requires a man=
ual prompt from the user for the next game to begin). That way of doing thi=
ngs is unworkable for me. What I need is a set-and-forget solution, somethi=
ng I can start to do overnight and in the morning the match wins is reporte=
d as something like (say) 257-243.

I am only=
guessing how long 500 by 5pt matches would take me, even if fully automate=
d. Additionally, I do not know how many sets of 500 by 5pt matches I would =
need to do to see a significant difference in METS. Maybe 5000, 50000 or 50=
0000. After seeing the difference in equity the PR2 MET can sometimes produ=
ce I am hoping for the former.

I have a friend who=
is a lot more computer savvy than I am and he has started playing around w=
ith different sockets/ports and instances of Gnubg. He tells me "You actual=
ly need 3 instances of gnubg running - I run all three without the graphica=
l interface, only pure terminal versions". However, before he goes to too m=
uch trouble I thought it best to contact the Gnubg team and see if you can =
help.

Maybe you only have to change "a couple of l=
ines of programming" as someone on my forum suggested (lol). It won't be th=
at easy, I know!

2) Jim Segrave thought this issue=
might be of interest to the team.

https://i.posti=
mg.cc/brQf7sVw/4a1a-C-seed-6987657-1036800.png

The=
re should not be any difference in the cubeless and cubeful results, howeve=
r, there is. I think the cubeless results are right and the cubeful result =
discrepancy is due to some cubeful calculation drift. This particular rollo=
ut shows the discrepancy near the 5th dp. In other rollouts I did I believe=
the discrepancy crept into the 4th dp.

My PR2 MET=
tries for accuracy to the second dp(%) in all of the 9a9a entries I rolled=
. E.g. I have 1a2aC as 68.36% after compiling over a million trials and tha=
t should be accurate to 2dp(%).

Here is a further =
example. When I first rolled out 8a1a over 1 million times in a single roll=
out I got a final cubeful result of ~0.10705. However, I happened to be aro=
und my computer to watch the result at ~93% completion and see the equity c=
limb steadily from 0.10688 for over an hour to reach 0.10705. So what, you =
may ask? Well, I have watched enough rollouts to suppose that the 3rd decim=
al place if not the fourth should be set in stone at nearly a million trial=
s. Additionally, rollouts will have the equity jumping up and down a bit du=
e to variance, this rollout was not doing that, equity just went up and up =
in this case.

I was very suspicious so I then chec=
ked my 8a1a result by choosing 5 new seeds and doing 5x12960 trial rollouts=
using the same Gnubg settings. I got:

0.1067726

0.1068255

0.1066774

<=
div>0.1066116

0.1067552

<=
br>

The mean of these means is ~0.10672.

In terms of a MET entry that would be 10.67% vs 10.71% for the million+ ro=
llout. 5x12960=3D64800 trials is not really a lot, however, I have done eno=
ugh rollouts to know something is probably wrong here. I repeated this exer=
cise with another million+ trial rollout vs 5x12960 trials. In this second =
case, the 5x12960 results were all close to the mean 89.70% while the milli=
on+ rollout was 89.45%. Again, very different and the million+ trials are i=
naccurate in my opinion.

I am guessing that there =
is some problem with the cubeful algorithm that first creeps in at the 7th =
significant figure (sf), then migrates to the 6th, 5th, 4th sf etc... all g=
overned by the number of trials. For an average user, they won't ever see a=
problem at 5184 trials or even 51840 trials. However, I saw a problem with=
518400 trials and above. At the time of first seeing this issue, I abandon=
ed the 25 x 1M+ trials I had done for my MET project and started again. The=
way around this problem for me was to do sets of 46656 trials and tabulate=
them carefully.

An esoteric problem for sure and =
one that might be nearly irrelevant to everyone except me. However, there m=
ight be an easy remedy that has to do with increasing the number of sf used=
in Gnubg's cubeful algorithm(s).

3) Lastly, this =
is a small display problem to consider.

Since in b=
uilding a 31a31 MET I would check its extremities quite regularly to see if=
I had the right PR2 MET version loaded and I noticed a problem. There is a=
display problem at 23a31a where the equity for 25a31a is shown instead. In=
cidentally, the 31a23a equity is correct in the Gnubg table. You will not s=
ee a problem in the display of most of the MET's you have loaded (probably =
all the default ones you have) since a calculation will internally extrapol=
ate results from ~15a15a (mec.c perhaps). My PR2 MET is different, the extr=
apolation calculations Gnubg does for other MET's do not start until after =
31a31a. I think you have a small address problem to fix.

Kind regards,

Ian Dunstan

(Austra=
lian Backgammon Federation Director)

Hi Ian,

<= /div>

--000000000000e1cefd059701e431--
From MAILER-DAEMON Mon Nov 11 21:17:46 2019
Received: from list by lists.gnu.org with archive (Exim 4.90_1)
id 1iULkY-0003BO-DL
for mharc-bug-gnubg@gnu.org; Mon, 11 Nov 2019 21:17:46 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:36684)
by lists.gnu.org with esmtp (Exim 4.90_1)
(envelope-from ) id 1iULkW-00039s-Am
for bug-gnubg@gnu.org; Mon, 11 Nov 2019 21:17:45 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from ) id 1iULkT-0006r9-HE
for bug-gnubg@gnu.org; Mon, 11 Nov 2019 21:17:42 -0500
Received: from sentinel2.math.princeton.edu ([128.112.16.197]:51072)
by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
(Exim 4.71) (envelope-from )
id 1iULkT-0006qk-E5
for bug-gnubg@gnu.org; Mon, 11 Nov 2019 21:17:41 -0500
Received: from math.princeton.edu ([128.112.18.16])
by sentinel2.math.Princeton.EDU with esmtp (Exim 4.92.3)
(envelope-from )
id 1iULkO-0002iJ-Mf; Mon, 11 Nov 2019 21:17:37 -0500
Received: from tchow (helo=localhost)
by math.princeton.edu with local-esmtp (Exim 4.92.3)
(envelope-from )
id 1iULkL-0002G7-Pt; Mon, 11 Nov 2019 21:17:33 -0500
Date: Mon, 11 Nov 2019 21:17:33 -0500 (EST)
From: "Timothy Y. Chow"
Reply-To: tchow@alum.mit.edu
To: bug-gnubg@gnu.org
Subject: Re: [gnubg] Help with a new MET
In-Reply-To:
Message-ID:
References:
User-Agent: Alpine 2.21 (LRH 202 2017-01-01)
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset=US-ASCII
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy]
X-Received-From: 128.112.16.197
X-BeenThere: bug-gnubg@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Bug reports for and general discussion about GNU Backgammon."
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Tue, 12 Nov 2019 02:17:45 -0000
Ian,
Thanks for putting all this effort into a new MET!
I don't know too much about the innards of GNU Backgammon, but I do know
something about math and statistics.
In terms of how many matches you would have to play between GNU-old-MET
and GNU-new-MET, that depends on how much stronger GNU-new-MET is.
Suppose that GNU-new-MET has a 51%/49% edge over GNU-old-MET. That means
that if you played 1000 matches, then you would expect a score of 510 to
490. The problem is that if GNU-old-MET were playing against itself, the
standard deviation would be about 15.8. So a 510 to 490 result would be
far from statistically significant. You'd need about 10000 trials to
barely reach statistical significance: The expected score would be 5100 to
4900 and the standard deviation would be 50, so 5100 would be two standard
deviations away. In general the formula for the standard deviation is
sqrt(n)/2 where n is the number of matches.
There's another point to be cognizant of, which is that there is a
distinction between statistically significant evidence of the bare-bones
claim that "the new MET is better," and a good estimate of *how* much
stronger GNU-new-MET is than GNU-old-MET. Let's say you played 10000
matches and the score was 5100 to 4900. You could then claim that the new
MET is better, and say that this claim is significant at the two standard
deviation level. But you *couldn't* claim that you are 95% confident that
the new MET gives you a 51%/49% edge over the old MET. To get a good
estimate of the edge requires more trials. How many trials you need would
depend on how sharp an estimate you want.
I don't have as much insight into what might be going wrong with the
cubeful calculations. It does sound to me that there might be a problem
with floating-point precision, but someone with knowledge of the code will
have to comment on that.
Tim
From MAILER-DAEMON Mon Nov 11 22:04:52 2019
Received: from list by lists.gnu.org with archive (Exim 4.90_1)
id 1iUMU8-0002qX-HP
for mharc-bug-gnubg@gnu.org; Mon, 11 Nov 2019 22:04:52 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:40766)
by lists.gnu.org with esmtp (Exim 4.90_1)
(envelope-from ) id 1iUMU6-0002qO-Jm
for bug-gnubg@gnu.org; Mon, 11 Nov 2019 22:04:51 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from ) id 1iUMU5-0002qD-7P
for bug-gnubg@gnu.org; Mon, 11 Nov 2019 22:04:50 -0500
Received: from mail-lf1-x130.google.com ([2a00:1450:4864:20::130]:33202)
by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
(Exim 4.71) (envelope-from ) id 1iUMU4-0002q3-V1
for bug-gnubg@gnu.org; Mon, 11 Nov 2019 22:04:49 -0500
Received: by mail-lf1-x130.google.com with SMTP id d6so11141437lfc.0
for ; Mon, 11 Nov 2019 19:04:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
h=mime-version:references:in-reply-to:from:date:message-id:subject:to
:cc; bh=6BHj3UhK54RVKNytYmnbzkkRskFTBssCi4SVJAkliWI=;
b=ShtE4HnoHmCE3vMFUR+tACZFp3uzaRPYMewqK9zZakj/K5Hm9auqj0NL0RSdomYWun
bAVKTrIzIHHEYIol/MCQSkMkfO5eCCYNpomznJvINkmDFoLXg3dHYIK5BW8jcMrSYkaj
8NYCjZalN3DdBwzQP7LuZyssvRlWXLuXymKjQJndtseNUD5i7JQzakTB3qeKAzUbsY0E
nljTH+FRR1iKZJM1tIboVhhHbSQ/dDHjXk/0D5JYvODjB0QOubhSzGqndYp3Q9+WCR8B
uJ221gWEcstYMqMX2P8xLyqHJkaGAPmHxPDT/2W1mvyMZvcY15O3+sL2ohCfDpBybeOx
xUdA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20161025;
h=x-gm-message-state:mime-version:references:in-reply-to:from:date
:message-id:subject:to:cc;
bh=6BHj3UhK54RVKNytYmnbzkkRskFTBssCi4SVJAkliWI=;
b=Vj8FvzTV1r5kfRIq4rUiwCrwuaizNXVfI3WpHiOUGgQ8VYFtPPMDGOXMbtXt1yIRKa
/NHA9/xeQ+/mDtZKyK7I3YzQbwep9KExl5eHl+RHblB+0g4HcKl3dDhM9lvEvlh5ttEf
nTcIv1e3iTc493kjrUZeNTSNZZQeo/QZsI67vQ7frysbGqEAcbLxK0kqEbIPeaB46qhw
NoDGPB+oLf15REDz/ooM/WoObbj5BjabPvKfzUtCgCos3+EXq05btNzu2ykvA25zy0R1
EVtPSAhSw9pBhKJ5ky5eVvtRsVFohP0Nm0vBc57SLViTsHtx8VgllJW4tnhhchvkdr43
475w==
X-Gm-Message-State: APjAAAXEqO0vxQM2vPO12b/7hf1Q/DVKuy4JsccykNKV32W74Jsexcm3
HNO8Y+wR4siII92HD5UbP6eLhaqxagT1XTEefjSHFw==
X-Google-Smtp-Source: APXvYqzWd6K8GQqVDHRz+lmalnKUK2m5y9h+XXlEWH3XtC+ObRIYjZbOJwjY8+Js3PSEwA2vvowSNfrevO4HNThesEU=
X-Received: by 2002:a19:48cf:: with SMTP id v198mr16504890lfa.59.1573527886351;
Mon, 11 Nov 2019 19:04:46 -0800 (PST)
MIME-Version: 1.0
References:
In-Reply-To:
From: Joseph Heled
Date: Tue, 12 Nov 2019 16:04:34 +1300
Message-ID:
Subject: Re: [gnubg] Help with a new MET
To: tchow@alum.mit.edu
Cc: "bug-gnubg@gnu.org"
Content-Type: multipart/alternative; boundary="000000000000b5d33805971d82b3"
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
recognized.
X-Received-From: 2a00:1450:4864:20::130
X-BeenThere: bug-gnubg@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Bug reports for and general discussion about GNU Backgammon."
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Tue, 12 Nov 2019 03:04:52 -0000
--000000000000b5d33805971d82b3
Content-Type: text/plain; charset="UTF-8"
Hi Timothy,
Here is a stats question I encounter from time to time.
Suppose I run N BG games and collect the average win rates and gammon
rates. 4 estimates which are dependent as they sum to 1.
How do I determine the confidence intervals for each? This is a 4d vector
and it seems like a non trivial Q, but I assume this crops up a lot and
must have a standard answer.
what is your take?
Thanks, Joseph
On Tue, 12 Nov 2019 at 15:17, Timothy Y. Chow
wrote:
> Ian,
>
> Thanks for putting all this effort into a new MET!
>
> I don't know too much about the innards of GNU Backgammon, but I do know
> something about math and statistics.
>
> In terms of how many matches you would have to play between GNU-old-MET
> and GNU-new-MET, that depends on how much stronger GNU-new-MET is.
> Suppose that GNU-new-MET has a 51%/49% edge over GNU-old-MET. That means
> that if you played 1000 matches, then you would expect a score of 510 to
> 490. The problem is that if GNU-old-MET were playing against itself, the
> standard deviation would be about 15.8. So a 510 to 490 result would be
> far from statistically significant. You'd need about 10000 trials to
> barely reach statistical significance: The expected score would be 5100 to
> 4900 and the standard deviation would be 50, so 5100 would be two standard
> deviations away. In general the formula for the standard deviation is
> sqrt(n)/2 where n is the number of matches.
>
> There's another point to be cognizant of, which is that there is a
> distinction between statistically significant evidence of the bare-bones
> claim that "the new MET is better," and a good estimate of *how* much
> stronger GNU-new-MET is than GNU-old-MET. Let's say you played 10000
> matches and the score was 5100 to 4900. You could then claim that the new
> MET is better, and say that this claim is significant at the two standard
> deviation level. But you *couldn't* claim that you are 95% confident that
> the new MET gives you a 51%/49% edge over the old MET. To get a good
> estimate of the edge requires more trials. How many trials you need would
> depend on how sharp an estimate you want.
>
> I don't have as much insight into what might be going wrong with the
> cubeful calculations. It does sound to me that there might be a problem
> with floating-point precision, but someone with knowledge of the code will
> have to comment on that.
>
> Tim
>
>
--000000000000b5d33805971d82b3
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

) id 1iUZaX-00076h-OW
for bug-gnubg@gnu.org; Tue, 12 Nov 2019 12:04:23 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from ) id 1iUZaT-0004sU-DI
for bug-gnubg@gnu.org; Tue, 12 Nov 2019 12:04:18 -0500
Received: from sentinel2.math.princeton.edu ([128.112.16.197]:44554)
by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
(Exim 4.71) (envelope-from )
id 1iUZaT-0004rP-9f
for bug-gnubg@gnu.org; Tue, 12 Nov 2019 12:04:17 -0500
Received: from math.princeton.edu ([128.112.18.16])
by sentinel2.math.Princeton.EDU with esmtp (Exim 4.92.3)
(envelope-from )
id 1iUZaO-00032d-6m; Tue, 12 Nov 2019 12:04:14 -0500
Received: from tchow (helo=localhost)
by math.princeton.edu with local-esmtp (Exim 4.92.3)
(envelope-from )
id 1iUZaL-00066s-Ao; Tue, 12 Nov 2019 12:04:09 -0500
Date: Tue, 12 Nov 2019 12:04:09 -0500 (EST)
From: "Timothy Y. Chow"
Reply-To: tchow@alum.mit.edu
To: "bug-gnubg@gnu.org"
Subject: Re: [gnubg] Help with a new MET
In-Reply-To:
Message-ID:
References:
User-Agent: Alpine 2.21 (LRH 202 2017-01-01)
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="1887440914-1966428645-1573578249=:18303"
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy]
X-Received-From: 128.112.16.197
X-BeenThere: bug-gnubg@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Bug reports for and general discussion about GNU Backgammon."
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Tue, 12 Nov 2019 17:04:23 -0000
This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.
--1887440914-1966428645-1573578249=:18303
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8BIT
On Tue, 12 Nov 2019, Joseph Heled wrote:
> Hi Timothy,
> Here is a stats question I encounter from time to time.
>
> Suppose I run N BG games and collect the average win rates and gammon
> rates.
> 4 estimates which are dependent as they sum to 1. How do I determine
> the confidence intervals for each? This is a 4d vector and it seems
> like a non trivial Q, but I assume this crops up a lot and must have a
> standard answer. what is your take?
>
> Thanks, Joseph
Joseph,
I'm guessing that what you're really interested in is some measure of the
variation or dispersion of your sample dataset. In that case, you can
simply compute the sample standard deviation for each parameter of
interest. The fact that each sample consists of 4 numbers that satisfy
the equation that their sum equals 1 just means that your 4 estimated
standard deviations aren't independent estimates, but for most practical
purposes this is an irrelevant technicality.
On the other hand, if you really want to compute a confidence interval for
the purposes of hypothesis testing, then you need to be explicit about
what your null hypothesis and alternative hypotheses are. If you're not
sure what your null and alternative hypotheses are, then to me that
confirms that what you're interested in is not hypothesis testing but some
sense of how good an estimate your averages are.
It's important to realize that a 95% confidence interval does *not* mean
that there is a 95% probability that the quantity you're trying to
estimate lies in your interval. This is a common misconception about what
confidence intervals are.
https://en.wikipedia.org/wiki/Confidence_interval#Misunderstandings
If you really want to make statements of the form "there is a 95%
probability that the win rate is in such-and-such an interval" then you
need to adopt a Bayesian rather than a frequentist framework. In
particular you'll need to choose some prior probability distribution and
compute the posterior probability distribution by applying Bayes's rule to
your data.
Tim
--1887440914-1966428645-1573578249=:18303--
From MAILER-DAEMON Tue Nov 12 12:16:57 2019
Received: from list by lists.gnu.org with archive (Exim 4.90_1)
id 1iUZmj-0004Tk-Gt
for mharc-bug-gnubg@gnu.org; Tue, 12 Nov 2019 12:16:57 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:59181)
by lists.gnu.org with esmtp (Exim 4.90_1)
(envelope-from ) id 1iUZmf-0004Si-UY
for bug-gnubg@gnu.org; Tue, 12 Nov 2019 12:16:55 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from ) id 1iUZmb-0001jc-BY
for bug-gnubg@gnu.org; Tue, 12 Nov 2019 12:16:53 -0500
Received: from mail-lf1-x12b.google.com ([2a00:1450:4864:20::12b]:39149)
by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
(Exim 4.71) (envelope-from ) id 1iUZmb-0001j7-0h
for bug-gnubg@gnu.org; Tue, 12 Nov 2019 12:16:49 -0500
Received: by mail-lf1-x12b.google.com with SMTP id j14so4893230lfk.6
for ; Tue, 12 Nov 2019 09:16:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
h=mime-version:references:in-reply-to:from:date:message-id:subject:to
:cc; bh=Rw89FfVydV99kBEjaEM2EQsDzg4N/k3bVhy3u7LX/tA=;
b=Io0TIgJJ0mz3mAOlz79Am7aNnnISeqDXX26Rn+yvOOdJhYpOem3VgfPNKJZGqyOI/j
YH5Y4sdso3qbR+zfRPM79+ZdPvOcO/In/RxJLMwE8+lNkvW9tUVXH9BVq4yfIly5VAnF
6UiWNEC+tSWWMxF2CmFjHHCTOgJXrHpLmN+O/QwdT6KJ+LbysHJGAHa0RRmwlmSFkj5c
5YkIeqUaD9v6UK66D/GHvjF4/wkeIWEys0DBVFd7acN45KX26VmbgscRM0eUXLzCQ50M
2u89gJ0BlR69x9O5Ia+Lr0oeH1cScpZW88dTGX1QWDM2dYt9NMafPT79dIoChCSOdJPg
Zz7g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20161025;
h=x-gm-message-state:mime-version:references:in-reply-to:from:date
:message-id:subject:to:cc;
bh=Rw89FfVydV99kBEjaEM2EQsDzg4N/k3bVhy3u7LX/tA=;
b=hTpzj/m3X6Kuy5oDF9yhDd1liOjIBG7ZmlsJ9A3RI1mNNxVMTaagWaMFEvpHh90fEI
r+ipmQ8e3ACApspNfh/yGybiovgvilTLn5Th2lJvxDUp6n7b7Y0m4Zptnw+4Bxyqtngf
6x7QmKL2W8YzhvUXXNaEOVcb0/vDCOZ56e3jzhxewpWqethT3XGH9zC4gm4YIlna/WH+
bZCmln9IbAjculcIE08Q2Sw5w5czlW5uwnUGT0jqE9O1LDogvBu3r/4uD138pE3jSIFr
AvycZ9xrHa/0cgoBwRXe7dLZrNpxXn/jW6Zq7ODqpnDGgJxSF+gLDUMj5v1wo0zc6KDB
7lyA==
X-Gm-Message-State: APjAAAXkckG0KDcB3tUbutFGmmS03EXn+2aBZJusfXpTSX351RouyX3b
efZS/MruSFWMKjPcjJlZDkdqHCgJz8A5fGO+zYo=
X-Google-Smtp-Source: APXvYqziSdNelKYLTcbVRDlrqw8Lt0ajsnF/ekruFEtE76tPVYBO8dlIPVbeNlimF4Oenv2A0VH3cqTE5vB1qg+KwJs=
X-Received: by 2002:ac2:5999:: with SMTP id w25mr11206404lfn.42.1573579007422;
Tue, 12 Nov 2019 09:16:47 -0800 (PST)
MIME-Version: 1.0
References:
In-Reply-To:
From: Joseph Heled
Date: Wed, 13 Nov 2019 06:16:35 +1300
Message-ID:
Subject: Re: [gnubg] Help with a new MET
To: tchow@alum.mit.edu
Cc: "bug-gnubg@gnu.org"
Content-Type: multipart/alternative; boundary="000000000000c3783305972969fd"
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
recognized.
X-Received-From: 2a00:1450:4864:20::12b
X-BeenThere: bug-gnubg@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Bug reports for and general discussion about GNU Backgammon."
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Tue, 12 Nov 2019 17:16:55 -0000
--000000000000c3783305972969fd
Content-Type: text/plain; charset="UTF-8"
Hi Tim,
"but for most practical purposes this is an irrelevant technicality"
Are you saying that I can treat each of the 4 estimates independently? That
is, use sqrt(pq/N) as the std for each? seems problematic to me :)
Yes, a Bayesian approach would be better, but this probably involves things
like contour integration or other horrors. I hoped for something simpler.
-Joseph
On Wed, 13 Nov 2019 at 06:04, Timothy Y. Chow
wrote:
> On Tue, 12 Nov 2019, Joseph Heled wrote:
> > Hi Timothy,
> > Here is a stats question I encounter from time to time.
> >
> > Suppose I run N BG games and collect the average win rates and gammon
> > rates.
> > 4 estimates which are dependent as they sum to 1. How do I determine
> > the confidence intervals for each? This is a 4d vector and it seems
> > like a non trivial Q, but I assume this crops up a lot and must have a
> > standard answer. what is your take?
> >
> > Thanks, Joseph
>
> Joseph,
>
> I'm guessing that what you're really interested in is some measure of the
> variation or dispersion of your sample dataset. In that case, you can
> simply compute the sample standard deviation for each parameter of
> interest. The fact that each sample consists of 4 numbers that satisfy
> the equation that their sum equals 1 just means that your 4 estimated
> standard deviations aren't independent estimates, but for most practical
> purposes this is an irrelevant technicality.
>
> On the other hand, if you really want to compute a confidence interval for
> the purposes of hypothesis testing, then you need to be explicit about
> what your null hypothesis and alternative hypotheses are. If you're not
> sure what your null and alternative hypotheses are, then to me that
> confirms that what you're interested in is not hypothesis testing but some
> sense of how good an estimate your averages are.
>
> It's important to realize that a 95% confidence interval does *not* mean
> that there is a 95% probability that the quantity you're trying to
> estimate lies in your interval. This is a common misconception about what
> confidence intervals are.
>
> https://en.wikipedia.org/wiki/Confidence_interval#Misunderstandings
>
> If you really want to make statements of the form "there is a 95%
> probability that the win rate is in such-and-such an interval" then you
> need to adopt a Bayesian rather than a frequentist framework. In
> particular you'll need to choose some prior probability distribution and
> compute the posterior probability distribution by applying Bayes's rule to
> your data.
>
> Tim
--000000000000c3783305972969fd
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

--000000000000c3783305972969fd--
From MAILER-DAEMON Tue Nov 12 16:39:18 2019
Received: from list by lists.gnu.org with archive (Exim 4.90_1)
id 1iUdsc-0000zc-FY
for mharc-bug-gnubg@gnu.org; Tue, 12 Nov 2019 16:39:18 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:35128)
by lists.gnu.org with esmtp (Exim 4.90_1)
(envelope-from ) id 1iUdsZ-0000zF-1Y
for bug-gnubg@gnu.org; Tue, 12 Nov 2019 16:39:15 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from ) id 1iUdsV-0000HG-IX
for bug-gnubg@gnu.org; Tue, 12 Nov 2019 16:39:12 -0500
Received: from sentinel2.math.princeton.edu ([128.112.16.197]:51624)
by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
(Exim 4.71) (envelope-from )
id 1iUdsV-0000Gl-FD
for bug-gnubg@gnu.org; Tue, 12 Nov 2019 16:39:11 -0500
Received: from math.princeton.edu ([128.112.18.16])
by sentinel2.math.Princeton.EDU with esmtp (Exim 4.92.3)
(envelope-from )
id 1iUdsS-0000Du-4a; Tue, 12 Nov 2019 16:39:09 -0500
Received: from tchow (helo=localhost)
by math.princeton.edu with local-esmtp (Exim 4.92.3)
(envelope-from )
id 1iUdsP-0006k5-7u; Tue, 12 Nov 2019 16:39:05 -0500
Date: Tue, 12 Nov 2019 16:39:05 -0500 (EST)
From: "Timothy Y. Chow"
Reply-To: tchow@alum.mit.edu
To: "bug-gnubg@gnu.org"
Subject: Re: [gnubg] Help with a new MET
In-Reply-To:
Message-ID:
References:
User-Agent: Alpine 2.21 (LRH 202 2017-01-01)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy]
X-Received-From: 128.112.16.197
X-BeenThere: bug-gnubg@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Bug reports for and general discussion about GNU Backgammon."
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Tue, 12 Nov 2019 21:39:16 -0000
On Wed, 13 Nov 2019, Joseph Heled wrote:
> "but for most practical purposes this is an irrelevant technicality"
>
> Are you saying that I can treat each of the 4 estimates independently?
> That is, use sqrt(pq/N) as the std for each? seems problematic to me :)
No, I didn't say that. As I said, the 4 estimates are not independent.
What I recommended was for you to compute the sample standard deviation
for each parameter of interest. So for example, if you have 100 samples
and you're interested in the gammon rate, then first compute the mean
gammon rate over all your samples. Call that mu. Then for each sample
value g_i, compute (g_i - mu)^2. Sum these up, divide by 100, and take
the square root. This will give you some indication of the dispersion of
your sample set.
The formula sqrt(pq/N) arises when you're doing hypothesis testing. It's
the standard deviation under the null hypothesis. But so far, you haven't
specified a null hypothesis.
> Yes, a Bayesian approach would be better, but this probably involves
> things like contour integration or other horrors.
No, it doesn't. But you do need to specify a prior distribution.
Suppose you're interested in the win rate, and your prior distribution is
uniform on the interval [0,1]. For illustration purposes, let's say
you're satisfied with accuracy to 1 decimal place, so each of the
probabilities in the set {0.1, 0.2, ..., 0.9} has prior probability 1/9.
Now you start to collect data. Say the first data point is a win. Then
using Bayes's rule, you find that the posterior probability of a win rate
of j/10 is obtained by multiplying the prior probability by j/10, and then
normalizing so that everything sums to 1. So the posterior probabilities
work out to be
[1/45, 2/45, 3/45, 4/45, 5/45, 6/45, 7/45, 8/45, 9/45]
Similarly, if you observe a loss, then you adjust by multiplying the prior
probability by 1 - j/10 and normalizing. Repeat for every observation in
your sample.
Tim
From MAILER-DAEMON Tue Nov 12 17:11:02 2019
Received: from list by lists.gnu.org with archive (Exim 4.90_1)
id 1iUeNK-0006ww-4D
for mharc-bug-gnubg@gnu.org; Tue, 12 Nov 2019 17:11:02 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:38701)
by lists.gnu.org with esmtp (Exim 4.90_1)
(envelope-from ) id 1iUeNH-0006ts-3j
for bug-gnubg@gnu.org; Tue, 12 Nov 2019 17:11:00 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from ) id 1iUeNF-0000pB-4B
for bug-gnubg@gnu.org; Tue, 12 Nov 2019 17:10:59 -0500
Received: from mail-lf1-x135.google.com ([2a00:1450:4864:20::135]:36093)
by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
(Exim 4.71) (envelope-from ) id 1iUeNE-0000ow-SK
for bug-gnubg@gnu.org; Tue, 12 Nov 2019 17:10:57 -0500
Received: by mail-lf1-x135.google.com with SMTP id m6so168839lfl.3
for ; Tue, 12 Nov 2019 14:10:56 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
h=mime-version:references:in-reply-to:from:date:message-id:subject:to
:cc; bh=ugIAULAZlcj9mfXqom30vgpcQerSxViet7YLzD6MxVY=;
b=LLLXVH/9StSL5WMrQeJ2+u4xDZlDtLswT5AZcZ1MEZIPTQgrBUT7uffHA3NYugkIUH
vyJEeiId6Q3upQ6c0XxViyduU9jOIPHBk7+bJt7NzJPR0rq2kbUsHDeIEOmRZSSr7lqb
vEb1T4HndE9QD24/AhYAvO5VFAFFHpc2ZkIZdXVfPLCTJGjwTfa25kcew+YwyJh6nRT3
Z4lM5XMyU6sRKEDkuze5ltCc1Ke6Kkw6XFTDUkZ1empVuceZsOh2ata05+hkacuyUeQ1
SBpcjp4RrlWqmYQRPW63RgWyzVQMlVYwL8TAPbErXVNsJ5ev5ow9g18EZAgek3ZSyMLZ
b7uA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20161025;
h=x-gm-message-state:mime-version:references:in-reply-to:from:date
:message-id:subject:to:cc;
bh=ugIAULAZlcj9mfXqom30vgpcQerSxViet7YLzD6MxVY=;
b=c3t1XfjAZAss9BYktkqKAM61CIgwaxdhi0X+IHDYVYal+tcZDCgin0f0VDd8/UvHMt
mOShqWHsM4K19T3j/mPRujjIJbiah05L5LEXFapMCtXgQYGLaoOOAhFbg95SBL8DcChL
o8ttGykfl2GAtBDJFhJbPy1jjgkvCuGEjeDoC3XCcMuqAEtH436+2ENqB+PS5MLvPGsL
afqqO+Z20KjRQ7aJdlxgPgkX3bSmeEjcRB80SeQmMPDRfIBJSl4CjjJmVnAJhG9eUlm/
jInHAnVC3ry7uXAH8h8pby5TiCx6bXg2xLCC7AsFjRpIlmZRmMxpD7NG0XfxpTx8DPo0
5T+g==
X-Gm-Message-State: APjAAAUclTkk+3e9216n44wnw0TG1jJj7GTrL8zwI+5Lzcna1MkbwFIf
5JQjuI4cxVaDquHG9twqb/HtXGbqY12eeZG88mk=
X-Google-Smtp-Source: APXvYqzS/gACVbOPAO4XiFwSfck+3D5b52RL6i7O1iwL5jp72fYq2yd4kgtV1U8p2ZlzQI0yf5iF93PpqyIdyHP/CX0=
X-Received: by 2002:a19:8c1c:: with SMTP id o28mr98509lfd.105.1573596655416;
Tue, 12 Nov 2019 14:10:55 -0800 (PST)
MIME-Version: 1.0
References:
In-Reply-To:
From: Joseph Heled
Date: Wed, 13 Nov 2019 11:10:43 +1300
Message-ID:
Subject: Re: [gnubg] Help with a new MET
To: tchow@alum.mit.edu
Cc: "bug-gnubg@gnu.org"
Content-Type: multipart/alternative; boundary="000000000000aa7ae605972d85f4"
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
recognized.
X-Received-From: 2a00:1450:4864:20::135
X-BeenThere: bug-gnubg@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Bug reports for and general discussion about GNU Backgammon."
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Tue, 12 Nov 2019 22:11:00 -0000
--000000000000aa7ae605972d85f4
Content-Type: text/plain; charset="UTF-8"
I thought it was clear that what we want to establish a difference between
two (say players X and Y) by running games and testing that gammon-rate(X)
!= gammon-rate(Y).
-Joseph
On Wed, 13 Nov 2019 at 10:39, Timothy Y. Chow
wrote:
> On Wed, 13 Nov 2019, Joseph Heled wrote:
> > "but for most practical purposes this is an irrelevant technicality"
> >
> > Are you saying that I can treat each of the 4 estimates independently?
> > That is, use sqrt(pq/N) as the std for each? seems problematic to me :)
>
> No, I didn't say that. As I said, the 4 estimates are not independent.
> What I recommended was for you to compute the sample standard deviation
> for each parameter of interest. So for example, if you have 100 samples
> and you're interested in the gammon rate, then first compute the mean
> gammon rate over all your samples. Call that mu. Then for each sample
> value g_i, compute (g_i - mu)^2. Sum these up, divide by 100, and take
> the square root. This will give you some indication of the dispersion of
> your sample set.
>
> The formula sqrt(pq/N) arises when you're doing hypothesis testing. It's
> the standard deviation under the null hypothesis. But so far, you haven't
> specified a null hypothesis.
>
> > Yes, a Bayesian approach would be better, but this probably involves
> > things like contour integration or other horrors.
>
> No, it doesn't. But you do need to specify a prior distribution.
> Suppose you're interested in the win rate, and your prior distribution is
> uniform on the interval [0,1]. For illustration purposes, let's say
> you're satisfied with accuracy to 1 decimal place, so each of the
> probabilities in the set {0.1, 0.2, ..., 0.9} has prior probability 1/9.
> Now you start to collect data. Say the first data point is a win. Then
> using Bayes's rule, you find that the posterior probability of a win rate
> of j/10 is obtained by multiplying the prior probability by j/10, and then
> normalizing so that everything sums to 1. So the posterior probabilities
> work out to be
>
> [1/45, 2/45, 3/45, 4/45, 5/45, 6/45, 7/45, 8/45, 9/45]
>
> Similarly, if you observe a loss, then you adjust by multiplying the prior
> probability by 1 - j/10 and normalizing. Repeat for every observation in
> your sample.
>
> Tim
>
--000000000000aa7ae605972d85f4
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

You obviously put a lot of thou=
ght and effort into this project. Here are some initial reactions (more mig=
ht come later).

1. RE rollouts, you probably want =
to use the Python interface. Unfortunately=C2=A0I have been out of the loop=
for a long time, and can't remember the details.=C2=A0

<= /div>

2. I suggest you contact a statistician. I suspect the number of =
trials you need is quite large. The most important thing will be to establi=
sh a confidence interval for your equities.

3. I g=
enerated the early MET files for GNUBG, and dabbled a little in testing the=
effect of different tables. At the time I could not see a real difference =
in chequer play. So, while MET tables are theoretically interesting, person=
ally I find it hard to believe=C2=A0it will make GNUBG stronger in practice=
. I would be delighted=C2=A0to be proven wrong.

4.=
So, if you are (say) confident with the results=C2=A0up to 5 (or 7), I wou=
ld start by establishing the difference between the Variable MET and mec26 =
and/or XG for 5 or 7 point matches. This might be a good way to explore the=
issues.

5. I would like to see a short descriptio=
n of the Variable MET method, if you are willing to share it at this stage.=

Cheers, Joseph

On Mon, 11 Nov 2019 at 01:20, Ian Dunstan <ian.dunstan@yahoo.com> wrote:

<=
blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-l=
eft:1px solid rgb(204,204,204);padding-left:1ex">=

The mean of these means is ~0.10672.

=

<=
/blockquote>Hi team,

I am just fini=
shing a project that has taken me many months (years), the creation of a ne=
w backgammon MET. My nearly finished MET is called 'PR2' and it is =
a combination of both rollouts and a theoretical MET. It uses rollout trial=
s (+500k) of all match scores less than 9a9a and then a specially developed=
theoretical MET I call the 'Variable MET' to extend the rollout re=
sults to 31a31a. The Variable MET could generate all of the 9a9a MET probab=
ilities in its own right, however, I use it in my MET as an extrapolation t=
ool. A lot of time and effort has gone into the accuracy of both the 9a9a r=
ollouts and the development of the Variable MET, more so the latter.

<=
div>I could go into a lot more detail if you wish, however, =
what I would like to do now is test my new PR2 MET and your help with 1) be=
low is what I care most about. Tony Lezard (of Dueller renown) suggested I =
contact the Gnubg team after I asked him for help on testing.

1) What I would like to do is test my PR2 MET by doing a series=
of 5pt matches where one Gnubg player uses the PR2 MET and the other the n=
ow standard Kazaross-XG2 MET (in particular). My 'PR2' player faces=
himself in 500 by 5pt matches at a time and the results are recorded. The =
moves I don't care about, who won that 500 match series is all that is =
important.=C2=A0 I know that Gnubg can play itself now, however, not with d=
ifferent MET's loaded and not without a lot of human input (every game =
end requires a manual prompt from the user for the next game to begin). Tha=
t way of doing things is unworkable for me. What I need is a set-and-forget=
solution, something I can start to do overnight and in the morning the mat=
ch wins is reported as something like (say) 257-243.=C2=A0

I am only guessing how long 500 by 5pt matches would take me, even=
if fully automated. Additionally, I do not know how many sets of 500 by 5p=
t matches I would need to do to see a significant difference in METS. Maybe=
5000, 50000 or 500000. After seeing the difference in equity the PR2 MET c=
an sometimes produce I am hoping for the former.

I=
have a friend who is a lot more computer savvy than I am and he has starte=
d playing around with different sockets/ports and instances of Gnubg. He te=
lls me "You actually need 3 instances of gnubg running - I run all thr=
ee without the graphical interface, only pure terminal versions". Howe=
ver, before he goes to too much trouble I thought it best to contact the Gn=
ubg team and see if you can help.

Maybe you only h=
ave to change "a couple of lines of programming" as someone on my=
forum suggested (lol). It won't be that easy, I know!

2) Jim Segrave thought this issue might be of interest to the team=
.

There should no=
t be any difference in the cubeless and cubeful results, however, there is.=
I think the cubeless results are right and the cubeful result discrepancy =
is due to some cubeful calculation drift. This particular rollout shows the=
discrepancy near the 5th dp. In other rollouts I did I believe the discrep=
ancy crept into the 4th dp.

My PR2 MET tries for a=
ccuracy to the second dp(%) in all of the 9a9a entries I rolled. E.g. I hav=
e 1a2aC as 68.36% after compiling over a million trials and that should be =
accurate to 2dp(%).

Here is a further example. Whe=
n I first rolled out 8a1a over 1 million times in a single rollout I got a =
final cubeful result of ~0.10705. However, I happened to be around my compu=
ter to watch the result at ~93% completion and see the equity climb steadil=
y from 0.10688 for over an hour to reach 0.10705. So what, you may ask? Wel=
l, I have watched enough rollouts to suppose that the 3rd decimal place if =
not the fourth should be set in stone at nearly a million trials. Additiona=
lly, rollouts will have the equity jumping up and down a bit due to varianc=
e, this rollout was not doing that, equity just went up and up in this case=
.

I was very suspicious so I then checked my 8a1a =
result by choosing 5 new seeds and doing 5x12960 trial rollouts using the s=
ame Gnubg settings. I got:

0.1067726

0.1068255

0.1066774

0.1066116

0.1067552

In terms of=
a MET entry that would be 10.67% vs 10.71% for the million+ rollout. 5x129=
60=3D64800 trials is not really a lot, however, I have done enough rollouts=
to know something is probably wrong here. I repeated this exercise with an=
other million+ trial rollout vs 5x12960 trials. In this second case, the 5x=
12960 results were all close to the mean 89.70% while the million+ rollout =
was 89.45%. Again, very different and the million+ trials are inaccurate in=
my opinion.

I am guessing that there is some prob=
lem with the cubeful algorithm that first creeps in at the 7th significant =
figure (sf), then migrates to the 6th, 5th, 4th sf etc... all governed by t=
he number of trials. For an average user, they won't ever see a problem=
at 5184 trials or even 51840 trials. However, I saw a problem with 518400 =
trials and above. At the time of first seeing this issue, I abandoned the 2=
5 x 1M+ trials I had done for my MET project and started again. The way aro=
und this problem for me was to do sets of 46656 trials and tabulate them ca=
refully.

An esoteric problem for sure and one that=
might be nearly irrelevant to everyone except me. However, there might be =
an easy remedy that has to do with increasing the number of sf used in Gnub=
g's cubeful algorithm(s).

3) Lastly, this is a=
small display problem to consider.

Since in build=
ing a 31a31 MET I would check its extremities quite regularly to see if I h=
ad the right PR2 MET version loaded and I noticed a problem. There is a dis=
play problem at 23a31a where the equity for 25a31a is shown instead. Incide=
ntally, the 31a23a equity is correct in the Gnubg table. You will not see a=
problem in the display of most of the MET's you have loaded (probably =
all the default ones you have) since a calculation will internally extrapol=
ate results from ~15a15a (mec.c perhaps). My PR2 MET is different, the extr=
apolation calculations Gnubg does for other MET's do not start until af=
ter 31a31a. I think you have a small address problem to fix.

=

Kind regards,

Ian Dunstan

(Au=
stralian Backgammon Federation Director)

Hi Timothy,

Here is a stats question I =
encounter from time to time.=C2=A0

Suppose I run N=
BG games and collect the average win rates and gammon rates. 4 estimates w=
hich are dependent as they sum to 1.

How do I determine the confi=
dence intervals for each? This is a 4d vector and it seems like a non trivi=
al Q, but I assume this crops up a lot and must have a standard answer.

what is your take?

Thanks, Joseph

On Tue, 12 Nov 2019 at 15:17, Timothy Y. Chow <tchow@math.princeton.edu> wrote:

--000000000000b5d33805971d82b3--
From MAILER-DAEMON Tue Nov 12 12:04:25 2019
Received: from list by lists.gnu.org with archive (Exim 4.90_1)
id 1iUZab-0007Fb-4B
for mharc-bug-gnubg@gnu.org; Tue, 12 Nov 2019 12:04:25 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:57774)
by lists.gnu.org with esmtp (Exim 4.90_1)
(envelope-from Ian,

Thanks for putting all this effort into a new MET!

I don't know too much about the innards of GNU Backgammon, but I do kno= w

something about math and statistics.

In terms of how many matches you would have to play between GNU-old-MET

and GNU-new-MET, that depends on how much stronger GNU-new-MET is.

Suppose that GNU-new-MET has a 51%/49% edge over GNU-old-MET.=C2=A0 That me= ans

that if you played 1000 matches, then you would expect a score of 510 to490.=C2=A0 The problem is that if GNU-old-MET were playing against itself, = the

standard deviation would be about 15.8.=C2=A0 So a 510 to 490 result would = be

far from statistically significant.=C2=A0 You'd need about 10000 trials= to

barely reach statistical significance: The expected score would be 5100 to =

4900 and the standard deviation would be 50, so 5100 would be two standard =

deviations away.=C2=A0 In general the formula for the standard deviation is=

sqrt(n)/2 where n is the number of matches.

There's another point to be cognizant of, which is that there is a

distinction between statistically significant evidence of the bare-bonesclaim that "the new MET is better," and a good estimate of *how* = much

stronger GNU-new-MET is than GNU-old-MET.=C2=A0 Let's say you played 10= 000

matches and the score was 5100 to 4900.=C2=A0 You could then claim that the= new

MET is better, and say that this claim is significant at the two standard <= br> deviation level.=C2=A0 But you *couldn't* claim that you are 95% confid= ent that

the new MET gives you a 51%/49% edge over the old MET.=C2=A0 To get a good =

estimate of the edge requires more trials.=C2=A0 How many trials you need w= ould

depend on how sharp an estimate you want.

I don't have as much insight into what might be going wrong with thecubeful calculations.=C2=A0 It does sound to me that there might be a probl= em

with floating-point precision, but someone with knowledge of the code will =

have to comment on that.

Tim

Hi Tim,

"but=
for most practical purposes this is an irrelevant technicality"

=
Are you saying that I can treat each of the 4 est=
imates independently? That is, use sqrt(pq/N) as the std for each? seems pr=
oblematic to me :)

Yes, a Bayesian approach would =
be better, but this probably involves things like contour integration or ot=
her horrors. I hoped for something simpler.

-Josep=
h

On Wed, 13 Nov 2019 at 06:04, Timothy Y. Chow=
<tchow@math.princeton.edu> wrote:

On= Tue, 12 Nov 2019, Joseph Heled wrote:

> Hi Timothy,

> Here is a stats question I encounter from time to time.=C2=A0

>

> Suppose I run N BG games and collect the average win rates and gammon =

> rates.

> 4 estimates which are dependent as they sum to 1.=C2=A0 How do I deter= mine

> the confidence intervals for each? This is a 4d vector and it seems> like a non trivial Q, but I assume this crops up a lot and must have a=https://en.wikipedia.org/wiki/Con= fidence_interval#Misunderstandings

> standard answer.=C2=A0 what is your take?

>

> Thanks, Joseph

Joseph,

I'm guessing that what you're really interested in is some measure = of the

variation or dispersion of your sample dataset.=C2=A0 In that case, you can=

simply compute the sample standard deviation for each parameter of

interest.=C2=A0 The fact that each sample consists of 4 numbers that satisf= y

the equation that their sum equals 1 just means that your 4 estimated

standard deviations aren't independent estimates, but for most practica= l

purposes this is an irrelevant technicality.

On the other hand, if you really want to compute a confidence interval for =

the purposes of hypothesis testing, then you need to be explicit about

what your null hypothesis and alternative hypotheses are.=C2=A0 If you'= re not

sure what your null and alternative hypotheses are, then to me that

confirms that what you're interested in is not hypothesis testing but s= ome

sense of how good an estimate your averages are.

It's important to realize that a 95% confidence interval does *not* mea= n

that there is a 95% probability that the quantity you're trying to

estimate lies in your interval.=C2=A0 This is a common misconception about = what

confidence intervals are.

If you really want to make statements of the form "there is a 95%

probability that the win rate is in such-and-such an interval" then yo= u

need to adopt a Bayesian rather than a frequentist framework.=C2=A0 In

particular you'll need to choose some prior probability distribution an= d

compute the posterior probability distribution by applying Bayes's rule= to

your data.

Tim

I thought it was clear that what we want to establish a di=
fference between two (say players X and Y) by running games and testing tha=
t gammon-rate(X) !=3D gammon-rate(Y).

On Wed, 13 Nov 2019, Joseph Hel=
ed wrote:

> "but for most practical purposes this is an irrelevant technicali= ty"

>

> Are you saying that I can treat each of the 4 estimates independently?=

> That is, use sqrt(pq/N) as the std for each? seems problematic to me := )

No, I didn't say that.=C2=A0 As I said, the 4 estimates are not indepen= dent.

What I recommended was for you to compute the sample standard deviation

for each parameter of interest.=C2=A0 So for example, if you have 100 sampl= es

and you're interested in the gammon rate, then first compute the mean <= br> gammon rate over all your samples.=C2=A0 Call that mu.=C2=A0 Then for each = sample

value g_i, compute (g_i - mu)^2.=C2=A0 Sum these up, divide by 100, and tak= e

the square root.=C2=A0 This will give you some indication of the dispersion= of

your sample set.

The formula sqrt(pq/N) arises when you're doing hypothesis testing.=C2= =A0 It's

the standard deviation under the null hypothesis.=C2=A0 But so far, you hav= en't

specified a null hypothesis.

> Yes, a Bayesian approach would be better, but this probably involves <= br> > things like contour integration or other horrors.

No, it doesn't.=C2=A0 But you do need to specify a prior distribution. =

Suppose you're interested in the win rate, and your prior distribution = is

uniform on the interval [0,1].=C2=A0 For illustration purposes, let's s= ay

you're satisfied with accuracy to 1 decimal place, so each of the

probabilities in the set {0.1, 0.2, ..., 0.9} has prior probability 1/9.**
Now you start to collect data.=C2=A0 Say the first data point is a win.=C2=
=A0 Then **

using Bayes's rule, you find that the posterior probability of a win ra= te

of j/10 is obtained by multiplying the prior probability by j/10, and then =

normalizing so that everything sums to 1.=C2=A0 So the posterior probabilit= ies

work out to be

=C2=A0 [1/45, 2/45, 3/45, 4/45, 5/45, 6/45, 7/45, 8/45, 9/45]

Similarly, if you observe a loss, then you adjust by multiplying the prior =

probability by 1 - j/10 and normalizing.=C2=A0 Repeat for every observation= in

your sample.

Tim

--000000000000aa7ae605972d85f4--
From MAILER-DAEMON Tue Nov 12 17:53:05 2019
Received: from list by lists.gnu.org with archive (Exim 4.90_1)
id 1iUf20-0002wZ-Bv
for mharc-bug-gnubg@gnu.org; Tue, 12 Nov 2019 17:53:04 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:43979)
by lists.gnu.org with esmtp (Exim 4.90_1)
(envelope-from ) id 1iUf1x-0002wI-2Q
for bug-gnubg@gnu.org; Tue, 12 Nov 2019 17:53:02 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from ) id 1iUf1u-0007XI-5y
for bug-gnubg@gnu.org; Tue, 12 Nov 2019 17:52:59 -0500
Received: from sentinel2.math.princeton.edu ([128.112.16.197]:58356)
by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
(Exim 4.71) (envelope-from )
id 1iUf1u-0007Wx-33
for bug-gnubg@gnu.org; Tue, 12 Nov 2019 17:52:58 -0500
Received: from math.princeton.edu ([128.112.18.16])
by sentinel2.math.Princeton.EDU with esmtp (Exim 4.92.3)
(envelope-from )
id 1iUf1q-0001an-4H; Tue, 12 Nov 2019 17:52:55 -0500
Received: from tchow (helo=localhost)
by math.princeton.edu with local-esmtp (Exim 4.92.3)
(envelope-from )
id 1iUf1n-0000ol-7Z; Tue, 12 Nov 2019 17:52:51 -0500
Date: Tue, 12 Nov 2019 17:52:51 -0500 (EST)
From: "Timothy Y. Chow"
Reply-To: tchow@alum.mit.edu
To: "bug-gnubg@gnu.org"
Subject: Re: [gnubg] Help with a new MET
In-Reply-To:
Message-ID:
References:
User-Agent: Alpine 2.21 (LRH 202 2017-01-01)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy]
X-Received-From: 128.112.16.197
X-BeenThere: bug-gnubg@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Bug reports for and general discussion about GNU Backgammon."
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Tue, 12 Nov 2019 22:53:02 -0000
On Wed, 13 Nov 2019, Joseph Heled wrote:
> I thought it was clear that what we want to establish a difference
> between two (say players X and Y) by running games and testing that
> gammon-rate(X) != gammon-rate(Y).
Thanks for the clarification.
If you're pitting X against Y and your null hypothesis is that their
playing abilities are identical, then what I'd recommend is something like
this. Collect two sets of samples. In the first sample, play a bunch of
games and form a vector of counts like this:
(X wins BG, X wins G, X wins S, Y wins S, Y wins G, Y wins BG)
In the second sample, play the same number of games, and form a similar
vector but reverse the order of the counts:
(Y wins BG, Y wins G, Y wins S, X wins S, X wins G, X wins BG)
Now you've reduced your problem to testing whether these two sets of
sample vectors come from the same distribution. There are various ways
one can do this, a standard one being a chi squared two-sample test.
https://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/chi2samp.htm
Tim
From MAILER-DAEMON Thu Nov 14 17:53:59 2019
Received: from list by lists.gnu.org with archive (Exim 4.90_1)
id 1iVNzz-0000Wj-Ff
for mharc-bug-gnubg@gnu.org; Thu, 14 Nov 2019 17:53:59 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:33699)
by lists.gnu.org with esmtp (Exim 4.90_1)
(envelope-from ) id 1iVNzw-0000Wb-Iv
for bug-gnubg@gnu.org; Thu, 14 Nov 2019 17:53:57 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from ) id 1iVNzu-0006d0-Kv
for bug-gnubg@gnu.org; Thu, 14 Nov 2019 17:53:56 -0500
Received: from smtp4-g21.free.fr ([2a01:e0c:1:1599::13]:16642)
by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
(Exim 4.71) (envelope-from )
id 1iVNzt-0006cR-TK
for bug-gnubg@gnu.org; Thu, 14 Nov 2019 17:53:54 -0500
Received: from localhost (unknown [37.170.23.84])
(Authenticated sender: philippe.michel7)
by smtp4-g21.free.fr (Postfix) with ESMTPSA id 8694019F4F8;
Thu, 14 Nov 2019 23:53:47 +0100 (CET)
Date: Thu, 14 Nov 2019 23:53:19 +0100
From: Philippe Michel
To: Ian Dunstan
Cc: "bug-gnubg@gnu.org"
Subject: Re: Help with a new MET
Message-ID: <20191114225319.GA18731@genesis>
References: <1955418285.2228355.1573380564381.ref@mail.yahoo.com>
<1955418285.2228355.1573380564381@mail.yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1955418285.2228355.1573380564381@mail.yahoo.com>
User-Agent: Mutt/1.12.2 (2019-09-21)
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
recognized.
X-Received-From: 2a01:e0c:1:1599::13
X-BeenThere: bug-gnubg@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Bug reports for and general discussion about GNU Backgammon."
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Thu, 14 Nov 2019 22:53:57 -0000
On Sun, Nov 10, 2019 at 10:08:57AM +0000, Ian Dunstan wrote:
> 1) What I would like to do is test my PR2 MET by doing a series of 5pt
> matches where one Gnubg player uses the PR2 MET and the other the now
> standard Kazaross-XG2 MET (in particular).
> [...]
> I have a friend who is a lot more computer savvy than I am and he has
> started playing around with different sockets/ports and instances of
> Gnubg. He tells me "You actually need 3 instances of gnubg running - I
> run all three without the graphical interface, only pure terminal
> versions". However, before he goes to too much trouble I thought it
> best to contact the Gnubg team and see if you can help. Maybe you only
> have to change "a couple of lines of programming" as someone on my
> forum suggested (lol). It won't be that easy, I know!
The matchseries.py script that should be in your gnubg installation as
an example of using python and communicating between gnubg instances is
probably a good starting point and similar to what your friend intends
to do.
It is available at
http://cvs.savannah.gnu.org/viewvc/*checkout*/gnubg/gnubg/scripts/matchseries.py?revision=1.6
as well.
From MAILER-DAEMON Tue Nov 26 04:44:13 2019
Received: from list by lists.gnu.org with archive (Exim 4.90_1)
id 1iZXOH-00012F-7h
for mharc-bug-gnubg@gnu.org; Tue, 26 Nov 2019 04:44:13 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:54074)
by lists.gnu.org with esmtp (Exim 4.90_1)
(envelope-from ) id 1iZXOE-0000rt-Lj
for bug-gnubg@gnu.org; Tue, 26 Nov 2019 04:44:12 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from ) id 1iZXNN-0007md-J9
for bug-gnubg@gnu.org; Tue, 26 Nov 2019 04:43:19 -0500
Received: from sonic306-19.consmr.mail.gq1.yahoo.com ([98.137.68.82]:33369)
by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
(Exim 4.71) (envelope-from )
id 1iZXNM-0007lN-QE
for bug-gnubg@gnu.org; Tue, 26 Nov 2019 04:43:17 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048;
t=1574761394; bh=RKT+kDG6r7Cu/pZtsSF4aoiTeTvNFyqmVQ6eD46dw04=;
h=Date:From:To:Subject:References:From:Subject;
b=NhXbW/0s49x973p23yDJVJVxLvr7ibSOU0x4mknbAbVh+DNE9hcawdnmp5qqfAOyE+tnN9Kr8PQVrrk9A7YbBSdS2wjpvPbqgLok/K/J5t3WLp/zVGjnVEP19qO/hmoFDeI/BBrhSf12klKB9VQ4lJPKWPy6dTqhnAxhiIpzKoPjRQOZJNtKoHwbtrvBXGxf4skWdoWWh+QyiJpSJzb1v1ijRpMw8CO/AsOcNHv2PdMPZ27OwlmCN5d0cvvRvebLRNhew0hJgm5VVOjeR3BVkyCSm+sJnBW0mGYc/cHBAvTk9vPLas7SmNFFTs/2TR7CsZobxFlv50ajPcnI2k3wNg==
X-YMail-OSG: XI0ky74VM1nBiT5ZM4Qtu03HS2I3k4683zekOgi85SV7p8FI5xmC_VzQP4b3B5p
zogRHgH9d5bXvRsLEdCA_tacxVtbzvMmqnQMBFot05efk7my.N4nM5txKAY.VTP74V5bkigtmZhS
pMQQE17oRkUAIT3IQquzW7t2P1X45iSsfWbhFkMrhaNiIBduue67yde2CKNDjPeLU4eoO2WP4zQa
pjZ7o3GPIhknHgkSeBnLuQqooDcedt_dhyr079sb8V97Tj1TNAxNq2LC2uWwr3FCl4HpfiWfq21o
rbKv8AtktWvT._twxe4s0ptuluy.f.3go1uixT54UIjkGeB3iB9FO7nSImik9RSkfFiTY.14LDz9
1dKDgc7szXPM23lzoW44irmYC6Do.q9UfYWLGD4k72iE3NLuHNhyehaqdRuoBedF0bDz0zJX.Tqh
nDvK1OY9GJdrwbL6TMwRP6MqfLG.HkqEfaz0M0_1q6R6We52RawW3dXgk2MbBAT_M1fNhPR.14dr
9kB.c.PBygEQF4MbqW7tBhOSl1.df4JiEnQgYsww0OaBOMTbMVdPADG7DmFltFsKyahoVgcgCqVp
7zFqG94zBCvhDS_XBQwBNZlMSxGVZFfCCiTRjtyRS4CUC7uWPY_3wa87gpGZtBe6VN6lsFk3QnBT
lO9kyHKZESsOPJJt8PaVogsNm6pahixzp2NPB.oy6gj078iFf_cG4p7sXG4TGFsrk2XhVaBsTNSj
I5DF5.SJIqV355LYbiX0cnsDdCmASYq39N0gN7LJCcNKRtHxckRz8ANXRRmDTfwLBQxPBvDD0n66
3b47NIdDiingpH1cCxM0qXHl9T4GkX1ktmx1Q075t7vwuR0Kk0q0vx1JJS7gcMt1XfsBmkN9o7YP
VR.a8wdNEQzfeSox8Wnc5pZxWshBz2CHb..BCwE4Q6QdQz5Zj_uCcMZMMBQi.A7pFztlrBugCnsF
GpY9emtP5tzoIbYzfJsBlriiXxCZpJVU_aasHRfl_pxiCJtRIYuEb1ytgU6I.ngMENRTrD.j.edk
ETNrlDGavqzOHOMprJuvKkFmSMU6JYulGfzoA0_eaQEnPXLMsItvqqaRPYiDnr6IBfhbLWVQiSXr
YP0wt5nDT4vF8SUUr0WLPHoMHa_8nnfMhbKvDVIzoyYANnqy52CkkrdT7jK3EFlx_n_fQ40eOQS2
3FZrFtNGlxii3_p.4Dnn4.oT.xKqfp0Ek4Rp3otd6a41CfKqci4KDU01fG_iD5w6mMXfnOt_JzDB
ZdJQIDQHf3dulZXHfbvZen1oJtA3sMxT_H4qxbEOOIg_B5KLwQLuy4IEqXpmtA2D2WBcW5PDcEw.
4EIBjn3rUJvg1T4CwnhIi0CNcI1_qqQ--
Received: from sonic.gate.mail.ne1.yahoo.com by
sonic306.consmr.mail.gq1.yahoo.com with HTTP; Tue, 26 Nov 2019 09:43:14 +0000
Date: Tue, 26 Nov 2019 09:41:25 +0000 (UTC)
From: Ian Dunstan
To: "bug-gnubg@gnu.org"
Message-ID: <1669084716.125951.1574761286004@mail.yahoo.com>
Subject: [gnubg] Help with a new MET (2)
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_Part_125948_1116788430.1574761285999"
References: <1669084716.125951.1574761286004.ref@mail.yahoo.com>
X-Mailer: WebService/1.1.14822 YMailNorrin Mozilla/5.0 (Windows NT 10.0; Win64;
x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97
Safari/537.36
Content-Length: 13668
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy]
X-Received-From: 98.137.68.82
X-BeenThere: bug-gnubg@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Bug reports for and general discussion about GNU Backgammon."
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Tue, 26 Nov 2019 09:44:12 -0000
------=_Part_125948_1116788430.1574761285999
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Hi team,
I have tried several subscription attempts=C2=A0to the email list directly,=
I think, without=C2=A0success. After finding the archives I see that there=
have been a number of replies to my first post. I didn't read them until v=
ery recently I wasn't ignoring=C2=A0your feedback, I just had not viewed it=
. Thank you, all, for what I received. Also, I apologise that this post alm=
ost certainly starts a new thread, though I considered it more important to=
make a response now.
I have been in direct email contact with both Joseph and Philippe behind th=
e scenes and this has been most fruitful for me, thank you both. Philippe h=
as found an error in the way the external player handles its "Takes".=C2=A0=
Gergely=C2=A0Elias (the savvy friend I mentioned before) has built a new=C2=
=A0Gnubg=C2=A0build to take advantage of this bug-fix, however, we still ha=
ve problems it seems.=C2=A0Gergely=C2=A0will contact Philippe directly abou=
t those problems.
.
@Timothy:
Thank you for the interest/encouragement in my project! I provide=C2=A0a li=
ttle background for you, though please feel free to chat with me via email =
for more detail if you wish.
If we consider an approximation to a near-perfect-player (npp) as being (sa=
y)=C2=A0Gnubg=C2=A0Supremo+ or some=C2=A0XG=C2=A03-ply, 4-ply... + then I t=
hink it is fairly easy to believe that:
A (npp) vs another (npp)=C2=A0should be=C2=A0using a near-perfect MET (npm)
Our current (npm)s are not 'perfect', of course, although we can use the=C2=
=A0Rockwell-Kazaross=C2=A0MET or=C2=A0Kazaross-XG2=C2=A0as a reasonable app=
roximation. Note: Despite the rollout bot used and the number of rollout tr=
ials involved for each MET, I suspect that the Roc-Kaz MET=C2=A0may be=C2=
=A0better. A discussion for another day, perhaps.
In my case, the rollouts I performed recently were all originally done sole=
ly for the calibration purposes of my theoretical "Variable MET" (VM). Afte=
r doing the rollout trials and achieving what I consider a very successful =
calibration, it occurred to me that it would be interesting to find out wha=
t PR level the rollouts are=C2=A0related=C2=A0to. It transpired that after =
analysing=C2=A0an assortment of=C2=A0matches, mainly around (and averaging)=
the=C2=A05pt=C2=A0length, I got a PR of ~2.1. The idea for a 'PR2=C2=A0MET=
' was born from this finding.
Hence, the premise I wish to explore is that a top-level-human-player (tlhp=
) playing another (tlhp) might actually do better not using a (npm). i.e. s=
aid another way:
A (tlhp) vs another (tlhp) should use a MET for their level of play like th=
e=C2=A0PR2=C2=A0MET.=C2=A0
Incidentally, I am not being heretical here. There are several posts online=
sharing this viewpoint, and no, I am not looking for them ;-) However, I d=
id revisit my copy of "CAN A FISH TASTE TWICE AS GOOD" and there is a parag=
raph, or two, sympathetic to my stated case. I can re-type a couple of sent=
ences if I have to. FWIW, book co-author, Walter Trice, would have been a f=
ascinating man to talk to about this.
As Joseph pointed out to me, there is no theoretical reason that using some=
'lesser strength' MET achieves=C2=A0a better result for humans. I agree th=
ough I do wonder why humans are playing on=C2=A0METS=C2=A0based on underlyi=
ng g+bg rates of around 28.2% that=C2=A0achieve=C2=A02a1a=C2=A0equity of ~3=
2.3%.=C2=A0Whereas, our top humans are more likely=C2=A0to be=C2=A0achievin=
g numbers closer to 27.3% and 31.6% respectively and these last values I de=
termined from my rollouts and are inherently used in the=C2=A0PR2=C2=A0MET.=
=C2=A0
There is a lot more to it than provided here, the link between g+bg rates a=
nd the supposed link to=C2=A02a1aC, for one. Both those numbers (along with=
others) are important inputs in my (VM) used to create the=C2=A0PR2=C2=A0M=
ET.=C2=A0
Also, it is not like one human plays another and has a MET each sitting in =
'the cloud' above their heads awaiting exact use (a nice analogy from Josep=
h I thought). The MET we use comes in later during=C2=A0analysis=C2=A0and/o=
r before a match when we attempt to=C2=A0benchmark our play and cube=C2=A0d=
ecisions. I have barely scratched the surface with using the=C2=A0PR2=C2=A0=
MET with my own matches. Are=C2=A0there=C2=A0differences? Yes, and those=C2=
=A0interest=C2=A0me. How much overall difference does it make? Very unclear=
and I realise that the difference (if one exists) is probably tiny and wil=
l be very difficult to show with mathematical significance.
I am having fun trying though ;-)
Kind regards,
Ian Dunstan
(Australian Backgammon Federation,=C2=A0Secretary)
------=_Part_125948_1116788430.1574761285999
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

------=_Part_125948_1116788430.1574761285999--
From MAILER-DAEMON Tue Nov 26 13:14:12 2019
Received: from list by lists.gnu.org with archive (Exim 4.90_1)
id 1iZfLn-0007ce-Si
for mharc-bug-gnubg@gnu.org; Tue, 26 Nov 2019 13:14:11 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:53287)
by lists.gnu.org with esmtp (Exim 4.90_1)
(envelope-from ) id 1iZfLl-0007cY-B4
for bug-gnubg@gnu.org; Tue, 26 Nov 2019 13:14:10 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from ) id 1iZfLj-0003Ks-6T
for bug-gnubg@gnu.org; Tue, 26 Nov 2019 13:14:08 -0500
Received: from sentinel2.math.princeton.edu ([128.112.16.197]:58478)
by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
(Exim 4.71) (envelope-from )
id 1iZfLi-0003JP-Pu
for bug-gnubg@gnu.org; Tue, 26 Nov 2019 13:14:06 -0500
Received: from math.princeton.edu ([128.112.18.16])
by sentinel2.math.Princeton.EDU with esmtp (Exim 4.92.3)
(envelope-from )
id 1iZfLe-0006UA-1T; Tue, 26 Nov 2019 13:14:03 -0500
Received: from tchow (helo=localhost)
by math.princeton.edu with local-esmtp (Exim 4.92.3)
(envelope-from )
id 1iZfLb-00076t-It; Tue, 26 Nov 2019 13:13:59 -0500
Date: Tue, 26 Nov 2019 13:13:59 -0500 (EST)
From: "Timothy Y. Chow"
Reply-To: tchow@alum.mit.edu
To: bug-gnubg@gnu.org
Subject: Re: [gnubg] Help with a new MET (2)
In-Reply-To:
Message-ID:
References:
User-Agent: Alpine 2.21 (LRH 202 2017-01-01)
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset=US-ASCII
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy]
X-Received-From: 128.112.16.197
X-BeenThere: bug-gnubg@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Bug reports for and general discussion about GNU Backgammon."
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Tue, 26 Nov 2019 18:14:10 -0000
Ian,
The topic of "fish METs" is an interesting one. I think that they can be
useful for non-contact race cubes, where humans can make precise
calculations over the board using race formulas. Beyond that, I'm a
little skeptical about how practical they are.
One thing that I feel that hardly anyone pays attention to is the fact
that two players with the same PR can have very different styles and make
very different kinds of mistakes. In today's PR-crazed world, there is a
strong tendency for most people to oversimplify and use just one number to
capture a player's backgammon ability. In reality, backgammon ability is
a multi-dimensional beast. In my own playing group, I know that one
player plays blitzes much too timidly, while another plays blitzes well
but has a strong tendency to break anchor too riskily. When I'm playing
one of these players and am assessing taking a blitz cube or offering a
holding game cube, I will definitely take into account my knowledge of
their individual playing weaknesses. It's quite possible that both
players have the same PR, in which case a fish MET might not distinguish
between them.
The fact that backgammon ability is multi-dimensional means that it's hard
to create credible fish METs with any degree of accuracy. If one tries to
cripple GNU by forcing it to make occasional random mistakes, one can
measure the PR of "crippled GNU", but this does not mean that crippled GNU
is a credible model of how a human (with the same PR as crippled GNU)
plays.
The Jacobs/Trice fish book is a wonderful book and an interesting proof of
concept, but I think that the above difficulties (among others) mean that
it's hard to actually apply the ideas in real life.
Tim
From MAILER-DAEMON Tue Nov 26 19:38:22 2019
Received: from list by lists.gnu.org with archive (Exim 4.90_1)
id 1iZlLa-0003D8-Ds
for mharc-bug-gnubg@gnu.org; Tue, 26 Nov 2019 19:38:22 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:42929)
by lists.gnu.org with esmtp (Exim 4.90_1)
(envelope-from ) id 1iZlLW-0003Cz-Q2
for Bug-gnubg@gnu.org; Tue, 26 Nov 2019 19:38:20 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from ) id 1iZlLV-0007Rk-Gk
for Bug-gnubg@gnu.org; Tue, 26 Nov 2019 19:38:18 -0500
Received: from mail-qk1-x732.google.com ([2607:f8b0:4864:20::732]:46505)
by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
(Exim 4.71) (envelope-from ) id 1iZlLV-0007QN-By
for Bug-gnubg@gnu.org; Tue, 26 Nov 2019 19:38:17 -0500
Received: by mail-qk1-x732.google.com with SMTP id h15so17901828qka.13
for ; Tue, 26 Nov 2019 16:38:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
h=mime-version:references:in-reply-to:from:date:message-id:subject:to
:content-transfer-encoding;
bh=/gfwaz0rFNKxEj8KdNQ/5rOI6tpzf/xFx7xpd2fxNyo=;
b=il+bhqaRQgngE7TZHJtCbzK0EM6ua6VpPxYSMqAPl2lAOZe9k5cVpWrL9LlJAlxpMX
U5CGTxZ+zuza88t9BXtnv5681MAdHNETbE0cw61BtpW64VEa7JwmpOTNl7inbsx7DUCv
d/YeLfZ+W2oDEH9jxZx977NiRym/Gkk/YlIhIkDT3ZZ4XUecW/Zhr48iYXLpZmtlx1eI
OpJNdPL+NmRZixILPe7ZGgTsP2WbDrXgQ9/zHsVZyVUP5/Sd+TxcUboP4HeVKhL80Kxk
CeU0rX737o/WKfyiII8XjH2B9OAorX6c17Uw2cy6FoOfUak92kKn2Oa3Sr29rWKoKAw7
nw+A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20161025;
h=x-gm-message-state:mime-version:references:in-reply-to:from:date
:message-id:subject:to:content-transfer-encoding;
bh=/gfwaz0rFNKxEj8KdNQ/5rOI6tpzf/xFx7xpd2fxNyo=;
b=VQJDuztUEJbIuLvo5Y1JaqmZyx8ebJ9IQ9IkU4juoVg2xm967/KeklS2dA2QC1RlOF
pImlZKdcXTyx2lUhXgRelPucFO/MBGL0r7O6xpNmsLE3asc2mqgPUjbgfH2sU5mdjqBD
KGDQnIKAwPdZh6o6EG5bvZ0f0fz0yJbqBlM5QxZN164oD7IJRpkYsLs5tR5hoXh84Iro
XzlShnMcznuBklRBvzVDAJwcoheCxcflehbxqJP8rS/YYoC+GU7G6aPX7YGufU6C0PBl
xSyHrDigdPbijORoYvwcnRDxYo8KjhRNn+RCzdRdQ6inMPXk4BRmDYdIDTPIMVNltw/B
V13Q==
X-Gm-Message-State: APjAAAUZ7FpOAYs7c8Tqrd/m7yIrOfn1ph/06ha2Jjs169jYT7pXzVkU
I416AyVn6+1e4MBkxKZvYiPD27+T2E0I3b/3vgo=
X-Google-Smtp-Source: APXvYqwOaTXIqEcGOayMYXWh051HIzI/mha9xPqZ/c7zmbp+0kOY0dghoIMajzmORhYxmUIDkHJqTvYPfnnzc6cH6OI=
X-Received: by 2002:a37:6087:: with SMTP id u129mr1436376qkb.219.1574815096316;
Tue, 26 Nov 2019 16:38:16 -0800 (PST)
MIME-Version: 1.0
References: <1669084716.125951.1574761286004.ref@mail.yahoo.com>
<1669084716.125951.1574761286004@mail.yahoo.com>
<1314958228.522907.1574814338704@mail.yahoo.com>
In-Reply-To: <1314958228.522907.1574814338704@mail.yahoo.com>
From: Joseph Heled
Date: Wed, 27 Nov 2019 13:38:05 +1300
Message-ID:
Subject: Re: [gnubg] Help with a new MET (2)
To: Ian Dunstan ,
"bug-gnubg@gnu.org"
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
recognized.
X-Received-From: 2607:f8b0:4864:20::732
X-BeenThere: bug-gnubg@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Bug reports for and general discussion about GNU Backgammon."
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe:

-Joseph=C2=A0

On Wed, 13 Nov 2019 at 10:39, Timothy Y. Chow <tchow@math.princeton.edu> wrote:

> "but for most practical purposes this is an irrelevant technicali= ty"

>

> Are you saying that I can treat each of the 4 estimates independently?=

> That is, use sqrt(pq/N) as the std for each? seems problematic to me := )

No, I didn't say that.=C2=A0 As I said, the 4 estimates are not indepen= dent.

What I recommended was for you to compute the sample standard deviation

for each parameter of interest.=C2=A0 So for example, if you have 100 sampl= es

and you're interested in the gammon rate, then first compute the mean <= br> gammon rate over all your samples.=C2=A0 Call that mu.=C2=A0 Then for each = sample

value g_i, compute (g_i - mu)^2.=C2=A0 Sum these up, divide by 100, and tak= e

the square root.=C2=A0 This will give you some indication of the dispersion= of

your sample set.

The formula sqrt(pq/N) arises when you're doing hypothesis testing.=C2= =A0 It's

the standard deviation under the null hypothesis.=C2=A0 But so far, you hav= en't

specified a null hypothesis.

> Yes, a Bayesian approach would be better, but this probably involves <= br> > things like contour integration or other horrors.

No, it doesn't.=C2=A0 But you do need to specify a prior distribution. =

Suppose you're interested in the win rate, and your prior distribution = is

uniform on the interval [0,1].=C2=A0 For illustration purposes, let's s= ay

you're satisfied with accuracy to 1 decimal place, so each of the

probabilities in the set {0.1, 0.2, ..., 0.9} has prior probability 1/9.

using Bayes's rule, you find that the posterior probability of a win ra= te

of j/10 is obtained by multiplying the prior probability by j/10, and then =

normalizing so that everything sums to 1.=C2=A0 So the posterior probabilit= ies

work out to be

=C2=A0 [1/45, 2/45, 3/45, 4/45, 5/45, 6/45, 7/45, 8/45, 9/45]

Similarly, if you observe a loss, then you adjust by multiplying the prior =

probability by 1 - j/10 and normalizing.=C2=A0 Repeat for every observation= in

your sample.

Tim

Hi team,

I have tri= ed several subscription attempts to the email list directly, I think, = without success. After= finding the archives I see that there have been a number of replies to my = first post. I didn't read them until very recently I wasn't ignoring <= span class=3D"ydp8365241bmceItemHidden">your feedback, I just had no= t viewed it. Thank you, all, for what I received. Also, I apologise that th= is post almost certainly starts a new thread, though I considered it more i= mportant to make a response now.

I have been in direct email contact with both Joseph and Philippe be= hind the scenes and this has been most fruitful for me, thank you both. Phi= lippe has found an error in the way the external player handles its "Takes"= . Gergely Elias (= the savvy friend I mentioned before) has built a new Gnubg build to take advantage of this b= ug-fix, however, we still have problems it seems. Gergely will contact Philippe directly abo= ut those problems.

.

@Timothy:

Thank you for the interest/encouragement in my = project! I provide a little background for you, though please feel fre= e to chat with me via email for more detail if you wish.

If we consider an approximation to a near-pe= rfect-player (npp) as being= (say) Gnubg Supr= emo+ or some XG 3= -ply, 4-ply... + then I think it is fairly easy to believe that:

A (npp) vs another (np= p) should be&nb= sp;using a near-perfect MET (npm)

Our current (npm)s are not 'perfect', of cours= e, although we can use the Ro= ckwell-Kazaross MET or Kazaross-XG2 as a reasonable approximation. Note: Despite= the rollout bot used and the number of rollout trials involved for each ME= T, I suspect that the Roc-Kaz MET may be better. A discussion for another day, perhaps.

<= p style=3D"color: rgb(0, 0, 0); font-family: "Lucida Grande", Hel= vetica, Arial, sans-serif; font-size: 12px;">In my case, the rollouts I per= formed recently were all originally done solely for the calibration purpose= s of my theoretical "Variable MET" (VM). After doing the rollout trials and= achieving what I consider a very successful calibration, it occurred to me= that it would be interesting to find out what PR level the rollouts are&nb= sp;related to. It transpired that after analysing an assortment o= f matches, mainly around (and averaging) the 5pt length, I g= ot a PR of ~2.1. The idea for a 'PR2 MET' was born from this finding.<= /p>Hence, the premise I wish = to explore is that a top-level-human-player (tlhp) playing another (tlhp) might actually do better not using a (npm). i.e. said another way:

A (tlhp) vs another (tlhp) should use a MET for their level of play like the PR2 MET.

Incidentally, I am not being heretical here.= There are several posts online sharing this viewpoint, and no, I am not lo= oking for them ;-) However, I did revisit my copy of "CAN A FISH TASTE TWIC= E AS GOOD" and there is a paragraph, or two, sympathetic to my stated case.= I can re-type a couple of sentences if I have to. FWIW, book co-author, Wa= lter Trice, would have been a fascinating man to talk to about this.

As Joseph pointed out to me, the= re is no theoretical reason that using some 'lesser strength' MET achieves&= nbsp;a better result for humans. I agree though I do wonder why humans are = playing on METS based on underlying g+bg rates of around 28.2% th= at achieve 2a1a equity of ~32.3%. Whereas, our top huma= ns are more likely to be achieving numbers closer to 27.3% and 31= .6% respectively and these last values I determined from my rollouts and ar= e inherently used in the PR2 MET.

There is a lot more to it than provided here, the l= ink between g+bg rates and the supposed link to 2a1aC, for one. Both those numbers (along with ot= hers) are important inputs in my (VM) used to create the PR2 MET.

Also, it is not like one human plays another= and has a MET each sitting in 'the cloud' above their heads awaiting exact= use (a nice analogy from Joseph I thought). The MET we use comes in later = during analysis and/or before a match when we attempt to ben= chmark our play and cube decisions. I have barely scratched the surfac= e with using the PR2 MET with my own matches. Are there = ;differences? Yes, and those interest me. How much overall differ= ence does it make? Very unclear and I realise that the difference (if one e= xists) is probably tiny and will be very difficult to show with mathematica= l significance.

I am havin= g fun trying though ;-)

Ki= nd regards,

Ian Dunstan

(Australian Backgammon Federation, Secretary)