bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Bug-gnubg] Re: The importance of METs


From: Albert Silver
Subject: RE: [Bug-gnubg] Re: The importance of METs
Date: Tue, 9 Sep 2003 18:50:12 -0300

Although I won't take the blame for advertising the Mec26 as being
better (I only repeated), I will reiterate (last time) an apology for
misreading the data and misplacing a decimal point. Mea Culpa.

> > Douglas Zare wrote:
> > >
> > > Ok. I'm not sure that I see enough accuracy to say 0.12% rather
than
> > 0.0-0.2%,
> > > but I'll trust that someone has gone through that carefully.
However,
> the
> > > Woolsey-Heinrich MET is a straw man. Woolsey says he doesn't use
it
> (for
> > > extreme scores), and there are scores which seem to be quite
wrong,
> such as
> > for
> > > 3-away 4-away. If you have a new MET that is supposed to be an
> improvement
> > over
> > > what is out there, why not test it against METs people believe, or
at
> > least
> > > better ones?


> are other improvements that would make larger differences. However,
KvdD's
> experiments (which in this case look better) suggest that using
Woolsey's
> MET
> loses about 1 rating point. The confidence interval is relatively
wide,
> though.
> I would not be surprised if the correct value were 4 elo points.
> 
> One elo point. That is an interesting figure. How far will it be
spread?
> 
> A gain of a single elo point is not worth advertising, and this is why
I
> suggested that as a reality check, the improvements be expressed in
terms
> of
> elo. However, if you focus on some match scores, you will find larger,
> more
> meaningful improvements, even though those match scores are not hit in
> every
> match.
> 
> Douglas Zare

Understandably, the MET scores which are closest to reality will not
yield big differences when the respective scores are reached, but how
about specifically testing different scores using the METs where they
diverge the most? For example, instead of letting the results be diluted
by the numerous relatively accurate MET scores, how about running a
large series of games starting at specifically 4-away 3-away in order to
see how much of a change that part would bring compared to older scores
used? The same would be done for other scores where desired. Perhaps
overall the difference brought would be small, since the scores most
affected still have to be reached, but this way one could specifically
test certain conditions.

                                                Albert






reply via email to

[Prev in Thread] Current Thread [Next in Thread]