Internet engine matches

BertTuyt · Post by **BertTuyt** » Sat Dec 08, 2012 14:33

So for completeness the mathematical prove of the pDraw = pError * M formula, as used in the previous post.

When playing to a perfect player/program, the other side should never make an error during the M moves.

so: pDraw = ( 1 - pError ) ^ M

as pError is small this can be simplified/approximated :

pDraw = 1 - pError * M

as PLost = 0 and pDraw + pWin = 1

pWin = 1 - pDraw = 1 - ( 1 - pError * M ) = pError * M

Bert

BertTuyt · Post by **BertTuyt** » Sat Dec 08, 2012 15:21

Next implementation, the observation/hypothesis of Michel ( pError = c1/time^c2).

So including the hypothesis that for a speed increase of 10 , the KingsRow/Damage Error rate drops with 2 and that PError = around 0.003 (hope this time i dont include to many zeros

) one can write : pError = 0.003 / 10 ^ 0.3 ( 0.3 is the log of 10 ! ).

So combined with previous formula..., and with the factor F the speed-increase, and 30 the assumption for the Error-Move interval.

dELO = 800 * M * pError = 800 * 30 * 0.003 / F ^ 0.3 ) ( F = 1 for a 1 Min/Game match ) .

So finally: dELO = 72 / ( F ^ 0.3 )

Again this describes the ELO-difference compared with a perfect program.....

Bert

BertTuyt · Post by **BertTuyt** » Sat Dec 08, 2012 16:14

So the next one.....

dELO = 800 * M * pError, with pError is 0.003 (Damage/Kingsrow 1 Min - 1 Min Match), dELO = 2,4 * M .

If one would use a bigger Endgame DB than M would reduce , as in an earlier stage one will get into endgame DB nodes.
If all else would remain constant (which is not always the case as large DB-access also reduces the number of nodes/second), then one can calculate dM (delta Move).

Assume that in 65 moves the 20x20 initial position is reduce to 6 pieces , than for every additional piece in the DB ( so 6p --> 7p --> 8p ) one would reduce M with:

dM = 65 / ( 40 - 6 ) = 65 / 34 = 1,9.

So every additional piece will yield an earlier recognition of the DB, and the delta Move is around 1.9.

This will yield an ELO improvement of dELO = 2,4 * dM = 2,4 * 1,9 = 4.6 ELO points.

This small result seems to be confirmed by actual matches, where the DB impact is not dramatic (and/or in small matches is hidden in the statistics noise)...

Bert

BertTuyt · Post by **BertTuyt** » Sun Dec 09, 2012 13:48

Herewith the results from 3 matches against Flits (Damage still working on 1 core only).
I expected a better result for Damage, but it could be that I never played these matches before, so I mainly optimized Damage based on games against Kingsrow.
Results are from the perspective of Damage.

Code: Select all

Match                           W     D      L      ELO
Damage 1 Min  - Flits 1 Min     19    13     126    13
Damage 10 Min - Flits 1 Min     28     5     125    51
Damage 1 Min  - Flits 10 Min    6     28     124    -49

It is interesting that the Flits gain from 1 Min - 10 Min is with 62 ( -13--> 49 = 62) also in the 60 range.
Damage only increases with 38 points ( 51 - 13 ).

This result could be affected that pondering was (still) switched on for Flits (although the Flits DXP Server has an option to disable this).
Especially when Flits has much time during the Damage thought process.
I might redo this test to see if this assumption is valid.

For those interested the attached Match files.

Bert

BertTuyt · Post by **BertTuyt** » Sun Dec 09, 2012 14:11

Forgot to mention....

As Flits sometimes has glitches and played strange moves when there are few pieces on the board, I stopped the search when the root-position was a 7P (or less) DB-position.
The background, if the position is a DB win (from the perspective of Damage ) , Damage might win anyway. Only in a case of a DB-Draw or DB-Lost, Damage could survive due to the fact that Flits only has a 6P DB, and still can make errors.
As I didn't won to include these situations into the equation I choose to terminate the game and use the DB-score in these cases.

Bert

BertTuyt · Post by **BertTuyt** » Tue Dec 11, 2012 19:35

As I did not trust the Flits 1 Min - Damage 10 Min Match result, as pondering was on, so Flits was able to think far more than the 1 Min, I replayed the match, this time with Flits pondering disabled.
Herewith the new table ( where I also changed WDL, as it was wrong in the previous post

).

Code: Select all

Match                           W     D      L      ELO
Damage 1 Min  - Flits 1 Min     19    13     126    13
Damage 10 Min - Flits 1 Min     55     2     101    121 !!
Damage 1 Min  - Flits 10 Min    6     28     124    -49

Now the difference in ELO is much larger , and with 121 - 13 = 108, much more than the expected 60.
So back to the drawing board....

For those interested the match file.

Bert

BertTuyt · Post by **BertTuyt** » Tue Dec 11, 2012 19:42

Still WLD wrong in previous post

Bert

Code: Select all

Match                           W     L      D      ELO
Damage 1 Min  - Flits 1 Min     19    13     126    13
Damage 10 Min - Flits 1 Min     55     2     101    121 !!
Damage 1 Min  - Flits 10 Min    6     28     124    -49

Rein Halbersma · Post by **Rein Halbersma** » Tue Dec 11, 2012 19:43

BertTuyt wrote:Still WLD wrong in previous post

Bert

Code: Select all

Match                           W     L      D      ELO
Damage 1 Min  - Flits 1 Min     19    13     126    13
Damage 10 Min - Flits 1 Min     55     2     101    121 !!
Damage 1 Min  - Flits 10 Min    6     28     124    -49

See http://talkchess.com/forum/viewtopic.php?t=46370 for a related topic

BertTuyt · Post by **BertTuyt** » Tue Dec 11, 2012 21:16

Rein, thanks, I'm also following this forum (and this specific topic)..
What is your 5 cents so far?

If all assumptions are true, than we might be close to near perfect play..
Although it takes infinite power to reach asymptotic perfection, the ELO gain at one point is no longer interesting.
If we assume that the ELO gain for 10*fold speed increase is 60- 120 point today for several programs (this might depend on the strength of the program), than the formula predicts that there is only another 60 - 120 points left for perfect play...

Bert

Rein Halbersma · Post by **Rein Halbersma** » Tue Dec 11, 2012 21:35

BertTuyt wrote:Rein, thanks, I'm also following this forum (and this specific topic)..
What is your 5 cents so far?

If all assumptions are true, than we might be close to near perfect play..
Although it takes infinite power to reach asymptotic perfection, the ELO gain at one point is no longer interesting.
If we assume that the ELO gain for 10*fold speed increase is 60- 120 point today for several programs (this might depend on the strength of the program), than the formula predicts that there is only another 60 - 120 points left for perfect play...

Bert

I don't think you can extrapolate to perfect play. The reason is that your scaling experiments are with a fixed search technology (i.e. effective branching factor). If you change that, you might get different scaling behavior. Try e.g. to do your T vs k * T experiments (k = 2 or 10) with plain alpha-beta without iterative deepening, zero-windows, LMR etc. etc. I would be curious what kind of perfect play limit you would deduce from that.

BertTuyt · Post by **BertTuyt** » Tue Dec 11, 2012 21:46

Rein, thats a good suggestion...

So far we have results with Damage - Kingsrow - Flits en Dragon, and all use different search technologies (at least in detail).
And all seem to indicate that at least there are diminishing returns.

I hope you can agree that there might be a maximum ELO-level, and I'm interested what your guess is, how far we are from this level today ( 100 - 200 - 300 ELO ?).

I'm now going through a rematch Flits - Damage 1 Min - 1 Min, without pondering.
Hereafter I will do some 1 - 10 Min tests between Kingsrow and Flits.
And I will also do a test how Damage scales without LMR and MCP.

Did you do any tests related to this topic so far, and if so, what are your learnings/observations ?

Bert

Rein Halbersma · Post by **Rein Halbersma** » Tue Dec 11, 2012 21:51

BertTuyt wrote:Rein, thats a good suggestion...

So far we have results with Damage - Kingsrow - Flits en Dragon, and all use different search technologies (at least in detail).
And all seem to indicate that at least there are diminishing returns.

I hope you can agree that there might be a maximum ELO-level, and I'm interested what your guess is, how far we are from this level today ( 100 - 200 - 300 ELO ?).

I'm now going through a rematch Flits - Damage 1 Min - 1 Min, without pondering.
Hereafter I will do some 1 - 10 Min tests between Kingsrow and Flits.
And I will also do a test how Damage scales without LMR and MCP.

Did you do any tests related to this topic so far, and if so, what are your learnings/observations ?

Bert

Bert,

Yes, I think there is an upper limit to playing strength in terms of ELO. However, I think your current experiments are still too inaccurate (statistically speaking: too few games per match) to get a precise estimate of that upper limit. Another issue is that perhaps current programs don't explore the entire search space with their current evals. E.g. it could be that certain middle game positions give strong chances for a win, but are systematically neglected by all programs. Then the scaling experiments only show an upper limit with this restricted playing style. A grandmaster could come along and exploit that. For checkers, this argument doesn't apply, because there Kingsrow/Cake/Chinook get straight into the endgame databases as soon as they are out of their opening books. In draughts, there is a middle game where neither book nor databases are of much use. Perhaps there is still unexplored territory there.

Rein

MichelG · Post by **MichelG** » Tue Dec 11, 2012 23:37

One other thing to consider is that if program X makes an error, there is a fair chance that program Y doesn't capitalise on it.

Consider for instance, a certain position that has a drawing move. The move that leads to draw needs a 21 ply search.

However, program X thinks about the postion up to 20 ply and plays a losing move.

The position is now lost, but only if program Y finds the right continuation. This requires Y to find the right move (now only 20 moves deep). There seems to me a fairly big chance that Y does not find the right move, after all, it needs to search very deep to find it, and it may be missed due to lack of allocated time or lack of knowledge in the evaluation function. In short; if it is hard for X to find the right move, it will probably hard for Y and not every single losing move will turn into a loss.

I think Bert's estimation of the error rate (0.3%-0.4% per move) is a good lower limit, but the actual rate may be a bit higher than that.

Add to that potential exploitation of weak points of programs, and there should be room for improvement still.

BertTuyt · Post by **BertTuyt** » Thu Dec 13, 2012 00:09

I also replayed the Flits 1 Min - Damage 1 Min Match, as I thought that the initial low ELO difference could be related to Flits pondering, which I therefore switched of.
Herewith the update match results in the Table, and the match file for those interested.

Code: Select all

Match                           W     L      D      ELO
Damage 1 Min  - Flits 1 Min     23    8     127     33
Damage 10 Min - Flits 1 Min     55    2     101     121 !!
Damage 1 Min  - Flits 10 Min    6     28    124     -49

Now Damage has a 121 - 33 = 88 ELO point gain with 10-fold search-time, whereas Flits earns 49 + 33 = 82 points

Bert

Krzychumag · Post by **Krzychumag** » Sat Dec 15, 2012 09:43

Bert test program against people.

World Draughts Forum

Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches