[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-gnubg] Re: ratings formula, checker play vs. cube
From: |
kvandoel |
Subject: |
[Bug-gnubg] Re: ratings formula, checker play vs. cube |
Date: |
Sun, 14 Sep 2003 02:46:04 +0200 (CEST) |
Sat, 13 Sep 2003 18:47:05, Robert-Jan Veldhuizen:
>>I think you are still missing the point that the contributions of
>>checker play errors and cube play errors are converted to equivalent
>>rating loss INDEPENDENTLY.
>No, you miss my point. The above is obvious.
It's not so obvious, and is a very recent feature.
>Let me repeat. The problem is, cube errors are input into your formula
>AFTER being divided by what GNUBG considers "actual or close
>decisions". As I explained, that is often no good for match play and as
>such this gives inaccurate ratings. Read my other posts to see why.
One of your arguments for this that you brought forward was that since
checker errors are divided by total unforced moves, for "consistency",
or to make the "easier to compare", cube errors should also. This
argument makes no sense since cube errors and checker errors contribute
independently to the rating estimate. Do we agree then that this is not
a valid argument?
>Dividing the total cube error by all cube decisions gives a better
>estimate of someone's cube skill. Try it if you don't believe me, I'd
>say. I've analysed about 400 matches and I'd say GNUBG's current method
>is worse than the old method of counting all cube decisions.
I don't understand this argument. You've analysed 400 matches and looked
at the cube error numbers. Then you somehow say the
cubeError/totalCubeDecisions is a "better" measure of your cube skill
than cubeError/actualCubeDecisions. How do you make this judgement?
I can say I've analysed over 100,000 matches in the last month and I
think it's better to use the current cubeError/actualDecision. That
doesn't prove anything.
Your other arguments make sense to me, but they don't smell right to me.
Let's address them one by one and see if I can poke holes in them.
1)
>One obvious, and not that rare example is when both players are 2-away.
>With winning chances for the player on roll being anywhere between 30%
>and 70%, it is almost always correct to double, but some times it's
>optional (no market losers f.i.), and often it's a very close decision.
>If a player doubles immediately on his first opportunity, he's making
>no error, but it counts for one actual (and close) decision per the
>current scheme. If both players somehow delay doubling (which can have
>its advantages), every turn will count as a close cube decision. So,
>one could have two similar matches where 2-away both is reached, one
>where a player (correctly) doubles at the first opportunity, and one
>where f.i. the cube (again correctly) gets turned after 16 moves. The
>latter will give both players a much lower cube error rate with the
>current scheme, because 8 decisions are added for each player.
If both players somehow delay doubling, this will increase the
denominator and reduce the cube_error, ASSUMING no further mistakes are
made. BUt if a player delayed and became a gammon favourite and still
doubled (incorrectly), or became an underdog and didn't realize this and
still doubled , then the error rate would go up. SO it's not like you
suggest, that by delaying you can just manipulate the cube error with no
risk.
If you delay, you need more skill to avaoid error, no the effect you
object to, that the cube_error gets smaller is correct.
2)
>If it somehow happens that someone plays on for the gammon at 2-away
>both, that is in theory a bad plan (double/take and you just need a
>single win). However GNUBG will now "reward" you for this strategy by
>adding (almost) all moves as actual or close cube decisions, and this
>can instantly add like 25 decisions for BOTH players. In a 5 or 7pt
>match, that is often more than all other decisions in the match added
>up. This certainly does not reflect the skill involved, and is mostly
>an anomaly IMO.
Why? I often delay against doubling weaker players as they usually
know to take after I open with 3-1 and double on the next move, but
after reaching a somewhat more complicated position they need their
judgement and are more likely to make a mistake.
This is correctly reflected in counting every move as a cube decision
then.
3)
>Another example is when one player gets closed out and is very likely
>to get gammoned when he loses; (re)doubling then is often just a tiny
>error, and this might repeat for some turns as the other side brings
>the checkers around and bears off. Again GNUBG will add relatively many
>to the number of actual or close cube decisions, while it does not
>really reflect any more skill involved. As a result this kind of
>situation will drop one's cube error rate by the current scheme, which
>does not seem to reflect cube skill very well.
I think it does reflect cube skill, similar to the other examples. You
say "doubling then is often just a tiny error" but when it is not it is
a big error. The fact that you don't make this big error means you have
cube skill, which is reflected in this.
All these examples share the same idea: by not forcing a decision now
you can lower the effect of previous cube errors by increasing the
denominator, at the risk of making additional cube errors. That is, if
you succeed in lowering your previous cube errors, you must have enough
skill so the lowering is actually correct.
Now why do you think the "old" way, measuring cube-errors by
totalError/totalCubeDecisions is better? In all your examples the
effect of increasing the denominator with sequences of not doubling when
you could is still there, but I argue that's OK, you NEED this effect as
it involves skill. But in addition to this, the denominator now also
increases when for example you get doubled early, take and remain a huge
underdog forthe remainder of the game. Every one of those no-brainer
"don't redouble" decisions now dilutes your cube errors, which is NOT
correct.
To summarize, I argue that your examples show the opposite of what you
intended.
Kees
- RE: [Bug-gnubg] Re: Strange FIBS ratings, kvandoel, 2003/09/09
- RE: [Bug-gnubg] Re: Strange FIBS ratings, kvandoel, 2003/09/09
- RE: [Bug-gnubg] Re: Strange FIBS ratings, Albert Silver, 2003/09/09
- RE: [Bug-gnubg] Re: Strange FIBS ratings, kvandoel, 2003/09/10
- RE: [Bug-gnubg] Re: Strange FIBS ratings, kvandoel, 2003/09/10
- [Bug-gnubg] Re: ratings formula, checker play vs. cube, kvandoel, 2003/09/11
- [Bug-gnubg] Re: ratings formula, checker play vs. cube, kvandoel, 2003/09/12
- [Bug-gnubg] Re: ratings formula, checker play vs. cube,
kvandoel <=
- RE: [Bug-gnubg] Re: Strange FIBS ratings, kvandoel, 2003/09/10
- RE: [Bug-gnubg] Re: Strange FIBS ratings, Albert Silver, 2003/09/10
- RE: [Bug-gnubg] Re: Strange FIBS ratings, kvandoel, 2003/09/10
- RE: [Bug-gnubg] Re: Strange FIBS ratings, kvandoel, 2003/09/14