Internet engine matches
Re: Internet engine matches
So for completeness the mathematical prove of the pDraw = pError * M formula, as used in the previous post.
When playing to a perfect player/program, the other side should never make an error during the M moves.
so: pDraw = ( 1 - pError ) ^ M
as pError is small this can be simplified/approximated :
pDraw = 1 - pError * M
as PLost = 0 and pDraw + pWin = 1
pWin = 1 - pDraw = 1 - ( 1 - pError * M ) = pError * M
Bert
When playing to a perfect player/program, the other side should never make an error during the M moves.
so: pDraw = ( 1 - pError ) ^ M
as pError is small this can be simplified/approximated :
pDraw = 1 - pError * M
as PLost = 0 and pDraw + pWin = 1
pWin = 1 - pDraw = 1 - ( 1 - pError * M ) = pError * M
Bert
Re: Internet engine matches
Next implementation, the observation/hypothesis of Michel ( pError = c1/time^c2).
So including the hypothesis that for a speed increase of 10 , the KingsRow/Damage Error rate drops with 2 and that PError = around 0.003 (hope this time i dont include to many zeros ) one can write : pError = 0.003 / 10 ^ 0.3 ( 0.3 is the log of 10 ! ).
So combined with previous formula..., and with the factor F the speed-increase, and 30 the assumption for the Error-Move interval.
dELO = 800 * M * pError = 800 * 30 * 0.003 / F ^ 0.3 ) ( F = 1 for a 1 Min/Game match ) .
So finally: dELO = 72 / ( F ^ 0.3 )
Again this describes the ELO-difference compared with a perfect program.....
Bert
So including the hypothesis that for a speed increase of 10 , the KingsRow/Damage Error rate drops with 2 and that PError = around 0.003 (hope this time i dont include to many zeros ) one can write : pError = 0.003 / 10 ^ 0.3 ( 0.3 is the log of 10 ! ).
So combined with previous formula..., and with the factor F the speed-increase, and 30 the assumption for the Error-Move interval.
dELO = 800 * M * pError = 800 * 30 * 0.003 / F ^ 0.3 ) ( F = 1 for a 1 Min/Game match ) .
So finally: dELO = 72 / ( F ^ 0.3 )
Again this describes the ELO-difference compared with a perfect program.....
Bert
Re: Internet engine matches
So the next one.....
dELO = 800 * M * pError, with pError is 0.003 (Damage/Kingsrow 1 Min - 1 Min Match), dELO = 2,4 * M .
If one would use a bigger Endgame DB than M would reduce , as in an earlier stage one will get into endgame DB nodes.
If all else would remain constant (which is not always the case as large DB-access also reduces the number of nodes/second), then one can calculate dM (delta Move).
Assume that in 65 moves the 20x20 initial position is reduce to 6 pieces , than for every additional piece in the DB ( so 6p --> 7p --> 8p ) one would reduce M with:
dM = 65 / ( 40 - 6 ) = 65 / 34 = 1,9.
So every additional piece will yield an earlier recognition of the DB, and the delta Move is around 1.9.
This will yield an ELO improvement of dELO = 2,4 * dM = 2,4 * 1,9 = 4.6 ELO points.
This small result seems to be confirmed by actual matches, where the DB impact is not dramatic (and/or in small matches is hidden in the statistics noise)...
Bert
dELO = 800 * M * pError, with pError is 0.003 (Damage/Kingsrow 1 Min - 1 Min Match), dELO = 2,4 * M .
If one would use a bigger Endgame DB than M would reduce , as in an earlier stage one will get into endgame DB nodes.
If all else would remain constant (which is not always the case as large DB-access also reduces the number of nodes/second), then one can calculate dM (delta Move).
Assume that in 65 moves the 20x20 initial position is reduce to 6 pieces , than for every additional piece in the DB ( so 6p --> 7p --> 8p ) one would reduce M with:
dM = 65 / ( 40 - 6 ) = 65 / 34 = 1,9.
So every additional piece will yield an earlier recognition of the DB, and the delta Move is around 1.9.
This will yield an ELO improvement of dELO = 2,4 * dM = 2,4 * 1,9 = 4.6 ELO points.
This small result seems to be confirmed by actual matches, where the DB impact is not dramatic (and/or in small matches is hidden in the statistics noise)...
Bert
Re: Internet engine matches
Herewith the results from 3 matches against Flits (Damage still working on 1 core only).
I expected a better result for Damage, but it could be that I never played these matches before, so I mainly optimized Damage based on games against Kingsrow.
Results are from the perspective of Damage.
It is interesting that the Flits gain from 1 Min - 10 Min is with 62 ( -13--> 49 = 62) also in the 60 range.
Damage only increases with 38 points ( 51 - 13 ).
This result could be affected that pondering was (still) switched on for Flits (although the Flits DXP Server has an option to disable this).
Especially when Flits has much time during the Damage thought process.
I might redo this test to see if this assumption is valid.
For those interested the attached Match files.
Bert
I expected a better result for Damage, but it could be that I never played these matches before, so I mainly optimized Damage based on games against Kingsrow.
Results are from the perspective of Damage.
Code: Select all
Match W D L ELO
Damage 1 Min - Flits 1 Min 19 13 126 13
Damage 10 Min - Flits 1 Min 28 5 125 51
Damage 1 Min - Flits 10 Min 6 28 124 -49
Damage only increases with 38 points ( 51 - 13 ).
This result could be affected that pondering was (still) switched on for Flits (although the Flits DXP Server has an option to disable this).
Especially when Flits has much time during the Damage thought process.
I might redo this test to see if this assumption is valid.
For those interested the attached Match files.
Bert
- Attachments
-
- dxpgames-Dec-2012-v4.pdn
- (150.27 KiB) Downloaded 214 times
-
- dxpgames-Dec-2012-v3.pdn
- (152.7 KiB) Downloaded 212 times
-
- dxpgames-Dec-2012-v2.pdn
- (152.67 KiB) Downloaded 222 times
Re: Internet engine matches
Forgot to mention....
As Flits sometimes has glitches and played strange moves when there are few pieces on the board, I stopped the search when the root-position was a 7P (or less) DB-position.
The background, if the position is a DB win (from the perspective of Damage ) , Damage might win anyway. Only in a case of a DB-Draw or DB-Lost, Damage could survive due to the fact that Flits only has a 6P DB, and still can make errors.
As I didn't won to include these situations into the equation I choose to terminate the game and use the DB-score in these cases.
Bert
As Flits sometimes has glitches and played strange moves when there are few pieces on the board, I stopped the search when the root-position was a 7P (or less) DB-position.
The background, if the position is a DB win (from the perspective of Damage ) , Damage might win anyway. Only in a case of a DB-Draw or DB-Lost, Damage could survive due to the fact that Flits only has a 6P DB, and still can make errors.
As I didn't won to include these situations into the equation I choose to terminate the game and use the DB-score in these cases.
Bert
Re: Internet engine matches
As I did not trust the Flits 1 Min - Damage 10 Min Match result, as pondering was on, so Flits was able to think far more than the 1 Min, I replayed the match, this time with Flits pondering disabled.
Herewith the new table ( where I also changed WDL, as it was wrong in the previous post ).
Now the difference in ELO is much larger , and with 121 - 13 = 108, much more than the expected 60.
So back to the drawing board....
For those interested the match file.
Bert
Herewith the new table ( where I also changed WDL, as it was wrong in the previous post ).
Code: Select all
Match W D L ELO
Damage 1 Min - Flits 1 Min 19 13 126 13
Damage 10 Min - Flits 1 Min 55 2 101 121 !!
Damage 1 Min - Flits 10 Min 6 28 124 -49
So back to the drawing board....
For those interested the match file.
Bert
- Attachments
-
- dxpgames-Dec-2012-v5.pdn
- (156.07 KiB) Downloaded 208 times
Re: Internet engine matches
Still WLD wrong in previous post
Bert
Bert
Code: Select all
Match W L D ELO
Damage 1 Min - Flits 1 Min 19 13 126 13
Damage 10 Min - Flits 1 Min 55 2 101 121 !!
Damage 1 Min - Flits 10 Min 6 28 124 -49
-
- Posts: 1722
- Joined: Wed Apr 14, 2004 16:04
- Contact:
Re: Internet engine matches
See http://talkchess.com/forum/viewtopic.php?t=46370 for a related topicBertTuyt wrote:Still WLD wrong in previous post
Bert
Code: Select all
Match W L D ELO Damage 1 Min - Flits 1 Min 19 13 126 13 Damage 10 Min - Flits 1 Min 55 2 101 121 !! Damage 1 Min - Flits 10 Min 6 28 124 -49
Re: Internet engine matches
Rein, thanks, I'm also following this forum (and this specific topic)..
What is your 5 cents so far?
If all assumptions are true, than we might be close to near perfect play..
Although it takes infinite power to reach asymptotic perfection, the ELO gain at one point is no longer interesting.
If we assume that the ELO gain for 10*fold speed increase is 60- 120 point today for several programs (this might depend on the strength of the program), than the formula predicts that there is only another 60 - 120 points left for perfect play...
Bert
What is your 5 cents so far?
If all assumptions are true, than we might be close to near perfect play..
Although it takes infinite power to reach asymptotic perfection, the ELO gain at one point is no longer interesting.
If we assume that the ELO gain for 10*fold speed increase is 60- 120 point today for several programs (this might depend on the strength of the program), than the formula predicts that there is only another 60 - 120 points left for perfect play...
Bert
-
- Posts: 1722
- Joined: Wed Apr 14, 2004 16:04
- Contact:
Re: Internet engine matches
I don't think you can extrapolate to perfect play. The reason is that your scaling experiments are with a fixed search technology (i.e. effective branching factor). If you change that, you might get different scaling behavior. Try e.g. to do your T vs k * T experiments (k = 2 or 10) with plain alpha-beta without iterative deepening, zero-windows, LMR etc. etc. I would be curious what kind of perfect play limit you would deduce from that.BertTuyt wrote:Rein, thanks, I'm also following this forum (and this specific topic)..
What is your 5 cents so far?
If all assumptions are true, than we might be close to near perfect play..
Although it takes infinite power to reach asymptotic perfection, the ELO gain at one point is no longer interesting.
If we assume that the ELO gain for 10*fold speed increase is 60- 120 point today for several programs (this might depend on the strength of the program), than the formula predicts that there is only another 60 - 120 points left for perfect play...
Bert
Re: Internet engine matches
Rein, thats a good suggestion...
So far we have results with Damage - Kingsrow - Flits en Dragon, and all use different search technologies (at least in detail).
And all seem to indicate that at least there are diminishing returns.
I hope you can agree that there might be a maximum ELO-level, and I'm interested what your guess is, how far we are from this level today ( 100 - 200 - 300 ELO ?).
I'm now going through a rematch Flits - Damage 1 Min - 1 Min, without pondering.
Hereafter I will do some 1 - 10 Min tests between Kingsrow and Flits.
And I will also do a test how Damage scales without LMR and MCP.
Did you do any tests related to this topic so far, and if so, what are your learnings/observations ?
Bert
So far we have results with Damage - Kingsrow - Flits en Dragon, and all use different search technologies (at least in detail).
And all seem to indicate that at least there are diminishing returns.
I hope you can agree that there might be a maximum ELO-level, and I'm interested what your guess is, how far we are from this level today ( 100 - 200 - 300 ELO ?).
I'm now going through a rematch Flits - Damage 1 Min - 1 Min, without pondering.
Hereafter I will do some 1 - 10 Min tests between Kingsrow and Flits.
And I will also do a test how Damage scales without LMR and MCP.
Did you do any tests related to this topic so far, and if so, what are your learnings/observations ?
Bert
-
- Posts: 1722
- Joined: Wed Apr 14, 2004 16:04
- Contact:
Re: Internet engine matches
Bert,BertTuyt wrote:Rein, thats a good suggestion...
So far we have results with Damage - Kingsrow - Flits en Dragon, and all use different search technologies (at least in detail).
And all seem to indicate that at least there are diminishing returns.
I hope you can agree that there might be a maximum ELO-level, and I'm interested what your guess is, how far we are from this level today ( 100 - 200 - 300 ELO ?).
I'm now going through a rematch Flits - Damage 1 Min - 1 Min, without pondering.
Hereafter I will do some 1 - 10 Min tests between Kingsrow and Flits.
And I will also do a test how Damage scales without LMR and MCP.
Did you do any tests related to this topic so far, and if so, what are your learnings/observations ?
Bert
Yes, I think there is an upper limit to playing strength in terms of ELO. However, I think your current experiments are still too inaccurate (statistically speaking: too few games per match) to get a precise estimate of that upper limit. Another issue is that perhaps current programs don't explore the entire search space with their current evals. E.g. it could be that certain middle game positions give strong chances for a win, but are systematically neglected by all programs. Then the scaling experiments only show an upper limit with this restricted playing style. A grandmaster could come along and exploit that. For checkers, this argument doesn't apply, because there Kingsrow/Cake/Chinook get straight into the endgame databases as soon as they are out of their opening books. In draughts, there is a middle game where neither book nor databases are of much use. Perhaps there is still unexplored territory there.
Rein
Re: Internet engine matches
One other thing to consider is that if program X makes an error, there is a fair chance that program Y doesn't capitalise on it.
Consider for instance, a certain position that has a drawing move. The move that leads to draw needs a 21 ply search.
However, program X thinks about the postion up to 20 ply and plays a losing move.
The position is now lost, but only if program Y finds the right continuation. This requires Y to find the right move (now only 20 moves deep). There seems to me a fairly big chance that Y does not find the right move, after all, it needs to search very deep to find it, and it may be missed due to lack of allocated time or lack of knowledge in the evaluation function. In short; if it is hard for X to find the right move, it will probably hard for Y and not every single losing move will turn into a loss.
I think Bert's estimation of the error rate (0.3%-0.4% per move) is a good lower limit, but the actual rate may be a bit higher than that.
Add to that potential exploitation of weak points of programs, and there should be room for improvement still.
Consider for instance, a certain position that has a drawing move. The move that leads to draw needs a 21 ply search.
However, program X thinks about the postion up to 20 ply and plays a losing move.
The position is now lost, but only if program Y finds the right continuation. This requires Y to find the right move (now only 20 moves deep). There seems to me a fairly big chance that Y does not find the right move, after all, it needs to search very deep to find it, and it may be missed due to lack of allocated time or lack of knowledge in the evaluation function. In short; if it is hard for X to find the right move, it will probably hard for Y and not every single losing move will turn into a loss.
I think Bert's estimation of the error rate (0.3%-0.4% per move) is a good lower limit, but the actual rate may be a bit higher than that.
Add to that potential exploitation of weak points of programs, and there should be room for improvement still.
Re: Internet engine matches
I also replayed the Flits 1 Min - Damage 1 Min Match, as I thought that the initial low ELO difference could be related to Flits pondering, which I therefore switched of.
Herewith the update match results in the Table, and the match file for those interested.
Now Damage has a 121 - 33 = 88 ELO point gain with 10-fold search-time, whereas Flits earns 49 + 33 = 82 points
Bert
Herewith the update match results in the Table, and the match file for those interested.
Code: Select all
Match W L D ELO
Damage 1 Min - Flits 1 Min 23 8 127 33
Damage 10 Min - Flits 1 Min 55 2 101 121 !!
Damage 1 Min - Flits 10 Min 6 28 124 -49
Bert
- Attachments
-
- dxpgames-Dec-2012-v6.pdn
- (152.12 KiB) Downloaded 204 times
-
- Posts: 145
- Joined: Tue Sep 01, 2009 17:31
- Real name: Krzysztof Grzelak
Re: Internet engine matches
Bert test program against people.