Somehow the normalization of the rating scale is not quite right. E.g. http://www.pradu.us/old/Nov27_2008/Buzz/elotable.html gives a 35 Elo advantage for a 55% score.Ed Gilbert wrote:Bert,
Here is the output from bayeselo.
ResultSet>addplayer kingsrow
ResultSet>addplayer damage
ResultSet>addwld 0 1 21 5 132
ResultSet>addwld 0 1 16 3 139
ResultSet>elo
ResultSet-EloRating>advantage 0
0
ResultSet-EloRating>mm
00:00:00,00
ResultSet-EloRating>exactdist
00:00:00,00
ResultSet-EloRating>ratings
Rank Name Elo + - games score oppo. draws
1 kingsrow 11 14 14 316 55% -11 86%
2 damage -11 14 14 316 45% 11 86%
Internet engine matches
-
- Posts: 1720
- Joined: Wed Apr 14, 2004 16:04
- Contact:
Re: Internet engine matches
-
- Posts: 854
- Joined: Sat Apr 28, 2007 14:53
- Real name: Ed Gilbert
- Location: Morristown, NJ USA
- Contact:
Re: Internet engine matches
I don't know how bayeselo does its calculation of elo, but it seems to be affected by draws. Here is another 55% score with no draws, yielding a much different elo result.Rein Halbersma wrote:Somehow the normalization of the rating scale is not quite right. E.g. http://www.pradu.us/old/Nov27_2008/Buzz/elotable.html gives a 35 Elo advantage for a 55% score.
ResultSet>addplayer kingsrow
ResultSet>addplayer damage
ResultSet>addwld 0 1 55 45 0
ResultSet>elo
ResultSet-EloRating>advantage 0
0
ResultSet-EloRating>mm
00:00:00,00
ResultSet-EloRating>exactdist
00:00:00,00
ResultSet-EloRating>ratings
Rank Name Elo + - games score oppo. draws
1 kingsrow 21 33 32 100 55% -21 0%
2 damage -21 32 33 100 45% 21 0%
-
- Posts: 190
- Joined: Sun Sep 13, 2009 23:33
- Real name: Jan-Jaap van Horssen
- Location: Zeist, Netherlands
Re: Internet engine matches
Bert, that is a very nice score indeed! Is this with the Damage evaluation function or the Horizon evaluation function?BertTuyt wrote:Match Results, from the perspective of KingsRow.
Win: 16, Loose: 3, Draw: 139
...
Win: 21, Loose: 5, Draw: 132
I assume this is as close as other program came so far (but also Ed, you know most likely best).
I'm curious if Jan-Jaap has recent results against Kingsrow, if i remember well , his match results were slightly worse.
Yes, it is a pity we don't have Damy (and Dragon) info. Maybe because Gérard is halfway between old and new Damy?BertTuyt wrote:A pity that we dont have Damy info.
As I believe that Damy is very good, so I'm not sure if Damage comes second, but third is not that bad.
I don't have more recent results of Maximus against Kingsrow. I did however play a test match before the Computer Olympiad:
Maximus "Tilburg" vs. Maximus "Culemborg" 26 wins 3 losses 129 draws 57.3%
The (big) difference is mainly due to reconsidering the KVO pattern, after observing that Maximus lost (too) many KVO games against Flits and Kingsrow. It turned out earlier "improvements" were in fact changes for the worse. I don't plan on playing a new set of matches shortly as I want to make some more improvements first.
Jan-Jaap
www.maximusdraughts.org
Re: Internet engine matches
Jan-Jaap, this was with the Damage Evaluation function.
I'm testing some new and old ideas, which did not reveal a significant difference so far in the past.
But as I didn't test in this way before, they might show some small improvements.
Keep you posted,
Bert
I'm testing some new and old ideas, which did not reveal a significant difference so far in the past.
But as I didn't test in this way before, they might show some small improvements.
Keep you posted,
Bert
Re: Internet engine matches
Jan-Jaap,
I assume/hope that also Michael (Dragon) reads this Forum.
It would be nice if he would be able to play a 158 games match against Kingsrow (but im not sure if he has this program).
Anyway, really missing is Damy !!
Gerard, as a regular reader and contributer, could you provide some insights with respect to Damy in relation with KingsRow....????
I still believe that Kingsrow is the best , Damy a good second, and then followed by Maximus/Damage.
But the proof of the pudding ......
By the way, in the Dutch news paper Telegraaf today a publication from Harm Wiersma about KingsRow (the best program in the world according Harm) , Maximus and Truus...
Speaking about Truus (and Flits), is Maximus now beating both programs during a match (I though you shared that Flits is the Maximus angstgegener..., or the other way around)
In the mean time I re-implemented a number of older ideas, which I switched off in the past.
As I (long ago) tested manually Damage against Truus, the new ideas happened not to make a major difference.
Also I had the habbit of implementing briljant ( ) ideas the day before the yearly tournament.
After many disappointments , crashes, and strange behavior, I switched off many options..
I more and more belief that there isnt a magic button which will created an ELO explosion, so you need to test (via zillions of test-games) all these ideas step by step..
With Xmas approaching im preparing a list of things i want to change/improve/whatever, so I will keep you posted.....
Bert
I assume/hope that also Michael (Dragon) reads this Forum.
It would be nice if he would be able to play a 158 games match against Kingsrow (but im not sure if he has this program).
Anyway, really missing is Damy !!
Gerard, as a regular reader and contributer, could you provide some insights with respect to Damy in relation with KingsRow....????
I still believe that Kingsrow is the best , Damy a good second, and then followed by Maximus/Damage.
But the proof of the pudding ......
By the way, in the Dutch news paper Telegraaf today a publication from Harm Wiersma about KingsRow (the best program in the world according Harm) , Maximus and Truus...
Speaking about Truus (and Flits), is Maximus now beating both programs during a match (I though you shared that Flits is the Maximus angstgegener..., or the other way around)
In the mean time I re-implemented a number of older ideas, which I switched off in the past.
As I (long ago) tested manually Damage against Truus, the new ideas happened not to make a major difference.
Also I had the habbit of implementing briljant ( ) ideas the day before the yearly tournament.
After many disappointments , crashes, and strange behavior, I switched off many options..
I more and more belief that there isnt a magic button which will created an ELO explosion, so you need to test (via zillions of test-games) all these ideas step by step..
With Xmas approaching im preparing a list of things i want to change/improve/whatever, so I will keep you posted.....
Bert
-
- Posts: 854
- Joined: Sat Apr 28, 2007 14:53
- Real name: Ed Gilbert
- Location: Morristown, NJ USA
- Contact:
Re: Internet engine matches
Hi JJ,The (big) difference is mainly due to reconsidering the KVO pattern, after observing that Maximus lost (too) many KVO games against Flits and Kingsrow.
I have forgotten what KVO means. Could you refresh my memory? Thanks.
-- Ed
Re: Internet engine matches
Ed.
KVO = Korte Vleugel Opsluiting.
In a direct English translation Short Wing Lock (or something like that)
Bert
KVO = Korte Vleugel Opsluiting.
In a direct English translation Short Wing Lock (or something like that)
Bert
Re: Internet engine matches
Hi Bert,
As you all now building a new version is a long trip and I have still to finalize some major developments. The major differences with my previous version are the following
1) A multithread engine (the previous version handled only 2 threads)
2) A search based more on MTD-f-best procedure than on MTD-f procedure
3) A very new pruning technique to avoid answering a bad move by another bad move (a rather difficult item but now it seems to work properly)
4) A new algorithm to handle loops (when it exist at least one king in each side)
5) A completely new approach for the evaluation function. In my previous version my evaluation function was rather complicated because it had to take into account both static considerations (control of center, active or inactive man, presence of arrow, overload of a pattern etc.) and dynamic considerations (weakness of an outpost against a repetive attack, breakthrough etc.). In my new version, static and dynamic evaluations are completely decoupled
6) The 8 pieces egdb
7) The DamExchange protocol (!)
A new GUI
…
I have still some works to do on points 3 and 5 above. After that I will begin matchs against Kingsrow for a debugging and a tuning activity.
BTW I intend also to come back to official tournaments in 2012.
Gérard
Of all of us, I am certainly the more impatient to see what Damy will do against Kingsrow!!!BertTuyt wrote:Jan-Jaap,
I assume/hope that also Michael (Dragon) reads this Forum.
It would be nice if he would be able to play a 158 games match against Kingsrow (but im not sure if he has this program).
Anyway, really missing is Damy !!
Gerard, as a regular reader and contributer, could you provide some insights with respect to Damy in relation with KingsRow....????
I still believe that Kingsrow is the best , Damy a good second, and then followed by Maximus/Damage.
But the proof of the pudding ......
By the way, in the Dutch news paper Telegraaf today a publication from Harm Wiersma about KingsRow (the best program in the world according Harm) , Maximus and Truus...
Speaking about Truus (and Flits), is Maximus now beating both programs during a match (I though you shared that Flits is the Maximus angstgegener..., or the other way around)
In the mean time I re-implemented a number of older ideas, which I switched off in the past.
As I (long ago) tested manually Damage against Truus, the new ideas happened not to make a major difference.
Also I had the habbit of implementing briljant ( ) ideas the day before the yearly tournament.
After many disappointments , crashes, and strange behavior, I switched off many options..
I more and more belief that there isnt a magic button which will created an ELO explosion, so you need to test (via zillions of test-games) all these ideas step by step..
With Xmas approaching im preparing a list of things i want to change/improve/whatever, so I will keep you posted.....
Bert
As you all now building a new version is a long trip and I have still to finalize some major developments. The major differences with my previous version are the following
1) A multithread engine (the previous version handled only 2 threads)
2) A search based more on MTD-f-best procedure than on MTD-f procedure
3) A very new pruning technique to avoid answering a bad move by another bad move (a rather difficult item but now it seems to work properly)
4) A new algorithm to handle loops (when it exist at least one king in each side)
5) A completely new approach for the evaluation function. In my previous version my evaluation function was rather complicated because it had to take into account both static considerations (control of center, active or inactive man, presence of arrow, overload of a pattern etc.) and dynamic considerations (weakness of an outpost against a repetive attack, breakthrough etc.). In my new version, static and dynamic evaluations are completely decoupled
6) The 8 pieces egdb
7) The DamExchange protocol (!)
A new GUI
…
I have still some works to do on points 3 and 5 above. After that I will begin matchs against Kingsrow for a debugging and a tuning activity.
BTW I intend also to come back to official tournaments in 2012.
Gérard
Gérard
-
- Posts: 190
- Joined: Sun Sep 13, 2009 23:33
- Real name: Jan-Jaap van Horssen
- Location: Zeist, Netherlands
Re: Internet engine matches
Ed, KVO is called right wing lock in the Course in draughts. JJEd Gilbert wrote:Hi JJ,The (big) difference is mainly due to reconsidering the KVO pattern, after observing that Maximus lost (too) many KVO games against Flits and Kingsrow.
I have forgotten what KVO means. Could you refresh my memory? Thanks.
-- Ed
www.maximusdraughts.org
-
- Posts: 190
- Joined: Sun Sep 13, 2009 23:33
- Real name: Jan-Jaap van Horssen
- Location: Zeist, Netherlands
Re: Internet engine matches
Gerard, that sounds impressive! It would also be very interesting (for you) to see the result of old Damy vs. new Damy.TAILLE wrote:Of all of us, I am certainly the more impatient to see what Damy will do against Kingsrow!!!
As you all now building a new version is a long trip and I have still to finalize some major developments. The major differences with my previous version are the following
1) A multithread engine (the previous version handled only 2 threads)
2) A search based more on MTD-f-best procedure than on MTD-f procedure
3) A very new pruning technique to avoid answering a bad move by another bad move (a rather difficult item but now it seems to work properly)
4) A new algorithm to handle loops (when it exist at least one king in each side)
5) A completely new approach for the evaluation function. In my previous version my evaluation function was rather complicated because it had to take into account both static considerations (control of center, active or inactive man, presence of arrow, overload of a pattern etc.) and dynamic considerations (weakness of an outpost against a repetive attack, breakthrough etc.). In my new version, static and dynamic evaluations are completely decoupled
6) The 8 pieces egdb
7) The DamExchange protocol (!)
A new GUI
…
I have still some works to do on points 3 and 5 above. After that I will begin matchs against Kingsrow for a debugging and a tuning activity.
BTW I intend also to come back to official tournaments in 2012.
Jan-Jaap
www.maximusdraughts.org
Re: Internet engine matches
I do not intend to play games between the old and the new Damy version because only the new version will have the DamExchange protocol. That was the major weakness of my old versions; I was never able to test an improvement via a match between versions.jj wrote:Gerard, that sounds impressive! It would also be very interesting (for you) to see the result of old Damy vs. new Damy.TAILLE wrote:Of all of us, I am certainly the more impatient to see what Damy will do against Kingsrow!!!
As you all now building a new version is a long trip and I have still to finalize some major developments. The major differences with my previous version are the following
1) A multithread engine (the previous version handled only 2 threads)
2) A search based more on MTD-f-best procedure than on MTD-f procedure
3) A very new pruning technique to avoid answering a bad move by another bad move (a rather difficult item but now it seems to work properly)
4) A new algorithm to handle loops (when it exist at least one king in each side)
5) A completely new approach for the evaluation function. In my previous version my evaluation function was rather complicated because it had to take into account both static considerations (control of center, active or inactive man, presence of arrow, overload of a pattern etc.) and dynamic considerations (weakness of an outpost against a repetive attack, breakthrough etc.). In my new version, static and dynamic evaluations are completely decoupled
6) The 8 pieces egdb
7) The DamExchange protocol (!)
A new GUI
…
I have still some works to do on points 3 and 5 above. After that I will begin matchs against Kingsrow for a debugging and a tuning activity.
BTW I intend also to come back to official tournaments in 2012.
Jan-Jaap
BTW I build in my new version an interesting application in order to use Kingsrow in parallel with Damy: each time I change Damy board position I force Kingsrow to analyse the new position. That way I can more easily debug Damy evaluation function.
Gérard
Re: Internet engine matches
Gerard, looking forward to see you back in 2012.
At least I will (most likely) join the Dutch open 2012.
Regarding your activities, which make a lot of sense to me the next questions/remarks:
1) Do you (as I do) use the YBWC algorithm, or did you implement a Taille special
2) So far I rely on the alpha-beta, with all the enhancements/improvements which you can find in computer games literature. Do you have a quantitative comparison between the alpha-beta and the MTD-f family?
In the computer chess blogs I see conflicting messages and some people (like Bob Hyat, Crafty) don't see any difference.
3) I also (so far) used the domain-independent techniques. Most recently i re-implemented (and im now examining this) a technique based on a specific Draughts rule, that a capture is forced, so with this the signature of bad moves becomes sometimes very transparent. Do you have any idea to what extend domain specific enhancement outperform the technique as used by the computer games community.
4) So far I recognize the need, but this has (not yet) a high priority
5) Fully agree. I also experimented with this begin 1990 when i wanted to beat Truus with machines searching 6-8 ply (or something like that). In the end i relied on a huge eval-function which is nowadays impossible to read , understand any more, and where maintenance is almost impossible. But I agree that the better approach is a direct communication between search and eval, where the eval can guide the search more then the static approach based on a return value and backtracking of scores...
Very much interested in your results.
In the past I experimented with the concept "super"-move which i also abandoned (but i found some amazing results with this !!)..
6) 8P , is something which I will do when my budget can deal with a dual processor 8-core each Ivy Bridge and a 500GByte SSD , so most likely 2013
7) Already implemented , and a great help...
With kind regards,
Bert
At least I will (most likely) join the Dutch open 2012.
Regarding your activities, which make a lot of sense to me the next questions/remarks:
1) Do you (as I do) use the YBWC algorithm, or did you implement a Taille special
2) So far I rely on the alpha-beta, with all the enhancements/improvements which you can find in computer games literature. Do you have a quantitative comparison between the alpha-beta and the MTD-f family?
In the computer chess blogs I see conflicting messages and some people (like Bob Hyat, Crafty) don't see any difference.
3) I also (so far) used the domain-independent techniques. Most recently i re-implemented (and im now examining this) a technique based on a specific Draughts rule, that a capture is forced, so with this the signature of bad moves becomes sometimes very transparent. Do you have any idea to what extend domain specific enhancement outperform the technique as used by the computer games community.
4) So far I recognize the need, but this has (not yet) a high priority
5) Fully agree. I also experimented with this begin 1990 when i wanted to beat Truus with machines searching 6-8 ply (or something like that). In the end i relied on a huge eval-function which is nowadays impossible to read , understand any more, and where maintenance is almost impossible. But I agree that the better approach is a direct communication between search and eval, where the eval can guide the search more then the static approach based on a return value and backtracking of scores...
Very much interested in your results.
In the past I experimented with the concept "super"-move which i also abandoned (but i found some amazing results with this !!)..
6) 8P , is something which I will do when my budget can deal with a dual processor 8-core each Ivy Bridge and a 500GByte SSD , so most likely 2013
7) Already implemented , and a great help...
With kind regards,
Bert
Re: Internet engine matches
1) Yes I use the YBWC based algorithm with no major modificationsBertTuyt wrote:Gerard, looking forward to see you back in 2012.
At least I will (most likely) join the Dutch open 2012.
Regarding your activities, which make a lot of sense to me the next questions/remarks:
1) Do you (as I do) use the YBWC algorithm, or did you implement a Taille special
2) So far I rely on the alpha-beta, with all the enhancements/improvements which you can find in computer games literature. Do you have a quantitative comparison between the alpha-beta and the MTD-f family?
In the computer chess blogs I see conflicting messages and some people (like Bob Hyat, Crafty) don't see any difference.
3) I also (so far) used the domain-independent techniques. Most recently i re-implemented (and im now examining this) a technique based on a specific Draughts rule, that a capture is forced, so with this the signature of bad moves becomes sometimes very transparent. Do you have any idea to what extend domain specific enhancement outperform the technique as used by the computer games community.
4) So far I recognize the need, but this has (not yet) a high priority
5) Fully agree. I also experimented with this begin 1990 when i wanted to beat Truus with machines searching 6-8 ply (or something like that). In the end i relied on a huge eval-function which is nowadays impossible to read , understand any more, and where maintenance is almost impossible. But I agree that the better approach is a direct communication between search and eval, where the eval can guide the search more then the static approach based on a return value and backtracking of scores...
Very much interested in your results.
In the past I experimented with the concept "super"-move which i also abandoned (but i found some amazing results with this !!)..
6) 8P , is something which I will do when my budget can deal with a dual processor 8-core each Ivy Bridge and a 500GByte SSD , so most likely 2013
7) Already implemented , and a great help...
With kind regards,
Bert
2) I perfectly know this “conflict” between the alpha-beta and the MTD-f fans. My view is the following: the MTD-f is far better than a pure alpha-beta procedure. But if you implement in your alpha-beta procedure a great part of the MTD-f procedure (I mean if you make an intensive use of null-windows) then I do not see major differences. Without any proof I prefer MTD-f procedure for the two following reasons:
a. I can easily take advantage of an MTD-f-best procedure especially in position where a move is almost forced
b. During a search at a given depth with a given test value, if I encounter a position already seen, I am pretty sure to find the position already resolved in the hash table for this depth and test value. I am not sure it is a great advantage with a search with few threads but I guess it could be very interesting with many search threads.
3) As we all know a lot of “domain-independent” techniques can be found in the literature. Seeing that some are good for chess and probably bad for draughts (e.g. the use the famous “null move”) I am wondering if such techniques could be called really “domain-independent”. LMR is another well known and efficient technique for chess. In draughts however the tree is less large than in chess and LMR technique seems less efficient. In addition, due to the highly tactical aspect of draughts LMR may become bad comparing to other techniques. For these reasons I do not use the LMR and built another technique based on recognition of (probably) bad moves.
4) The technique I use is draughts dependant
5) I understand what you mean when you write “eval can guide the search” but it is not exactly what I did. I just decoupled static and dynamic evaluation without any impact on the search. May be what you mention can be a future work!
6) In 2013 I guess I will be able to build the 9p db!
7) No additional comments
Gérard
Re: Internet engine matches
Hi,
I have just begun to play thematic games against Kingsrow (1.52) using the following initial position in a one-move ballot match (18 blitz games). Both programs are configured to use all their egdb (9 pieces for Kingsrow and 8 pieces for Damy) and to use 2 threads without pondering.
Each match allow me to debug and tune my search algorithm and my eval function.
I promised to give you my very first results. Here they are:
Date 11/03/2012 Damy/kingsrow 0 win, 5 loss, 11 draws
Date 12/03/2012 Damy/kingsrow 0 win, 4 loss, 12 draws
Date 14/03/2012 Damy/kingsrow 3 wins, 5 loss, 10 draws
very encouraging indeed!
I have just begun to play thematic games against Kingsrow (1.52) using the following initial position in a one-move ballot match (18 blitz games). Both programs are configured to use all their egdb (9 pieces for Kingsrow and 8 pieces for Damy) and to use 2 threads without pondering.
Each match allow me to debug and tune my search algorithm and my eval function.
I promised to give you my very first results. Here they are:
Date 11/03/2012 Damy/kingsrow 0 win, 5 loss, 11 draws
Date 12/03/2012 Damy/kingsrow 0 win, 4 loss, 12 draws
Date 14/03/2012 Damy/kingsrow 3 wins, 5 loss, 10 draws
very encouraging indeed!
Gérard
Re: Internet engine matches
Gerard,
any news regarding (thematic) Damy matches against KingsRow.
I play nowadays mini -matches (first 30 ballots) and i aim to play ( at least) 30 draws.
Last match 1 lost game, but it gave a clue for improvement, and thats the good news !!
Bert
any news regarding (thematic) Damy matches against KingsRow.
I play nowadays mini -matches (first 30 ballots) and i aim to play ( at least) 30 draws.
Last match 1 lost game, but it gave a clue for improvement, and thats the good news !!
Bert