Internet engine matches

Discussion about development of draughts in the time of computer and Internet.
Post Reply
Rein Halbersma
Posts: 1720
Joined: Wed Apr 14, 2004 16:04
Contact:

Re: Internet engine matches

Post by Rein Halbersma » Tue Dec 06, 2011 08:03

Ed Gilbert wrote:Bert,

Here is the output from bayeselo.

ResultSet>addplayer kingsrow
ResultSet>addplayer damage
ResultSet>addwld 0 1 21 5 132
ResultSet>addwld 0 1 16 3 139
ResultSet>elo
ResultSet-EloRating>advantage 0
0
ResultSet-EloRating>mm
00:00:00,00
ResultSet-EloRating>exactdist
00:00:00,00
ResultSet-EloRating>ratings
Rank Name Elo + - games score oppo. draws
1 kingsrow 11 14 14 316 55% -11 86%
2 damage -11 14 14 316 45% 11 86%
Somehow the normalization of the rating scale is not quite right. E.g. http://www.pradu.us/old/Nov27_2008/Buzz/elotable.html gives a 35 Elo advantage for a 55% score.

Ed Gilbert
Posts: 854
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Re: Internet engine matches

Post by Ed Gilbert » Tue Dec 06, 2011 13:45

Rein Halbersma wrote:Somehow the normalization of the rating scale is not quite right. E.g. http://www.pradu.us/old/Nov27_2008/Buzz/elotable.html gives a 35 Elo advantage for a 55% score.
I don't know how bayeselo does its calculation of elo, but it seems to be affected by draws. Here is another 55% score with no draws, yielding a much different elo result.

ResultSet>addplayer kingsrow
ResultSet>addplayer damage
ResultSet>addwld 0 1 55 45 0
ResultSet>elo
ResultSet-EloRating>advantage 0
0
ResultSet-EloRating>mm
00:00:00,00
ResultSet-EloRating>exactdist
00:00:00,00
ResultSet-EloRating>ratings
Rank Name Elo + - games score oppo. draws
1 kingsrow 21 33 32 100 55% -21 0%
2 damage -21 32 33 100 45% 21 0%

jj
Posts: 190
Joined: Sun Sep 13, 2009 23:33
Real name: Jan-Jaap van Horssen
Location: Zeist, Netherlands

Re: Internet engine matches

Post by jj » Thu Dec 08, 2011 23:44

BertTuyt wrote:Match Results, from the perspective of KingsRow.

Win: 16, Loose: 3, Draw: 139
...
Win: 21, Loose: 5, Draw: 132

I assume this is as close as other program came so far (but also Ed, you know most likely best).
I'm curious if Jan-Jaap has recent results against Kingsrow, if i remember well , his match results were slightly worse.
Bert, that is a very nice score indeed! Is this with the Damage evaluation function or the Horizon evaluation function?
BertTuyt wrote:A pity that we dont have Damy info.
As I believe that Damy is very good, so I'm not sure if Damage comes second, but third is not that bad.
Yes, it is a pity we don't have Damy (and Dragon) info. Maybe because Gérard is halfway between old and new Damy?
I don't have more recent results of Maximus against Kingsrow. I did however play a test match before the Computer Olympiad:

Maximus "Tilburg" vs. Maximus "Culemborg" 26 wins 3 losses 129 draws 57.3%

The (big) difference is mainly due to reconsidering the KVO pattern, after observing that Maximus lost (too) many KVO games against Flits and Kingsrow. It turned out earlier "improvements" were in fact changes for the worse. I don't plan on playing a new set of matches shortly as I want to make some more improvements first.

Jan-Jaap
www.maximusdraughts.org

BertTuyt
Posts: 1573
Joined: Wed Sep 01, 2004 19:42

Re: Internet engine matches

Post by BertTuyt » Fri Dec 09, 2011 22:16

Jan-Jaap, this was with the Damage Evaluation function.
I'm testing some new and old ideas, which did not reveal a significant difference so far in the past.
But as I didn't test in this way before, they might show some small improvements.

Keep you posted,

Bert

BertTuyt
Posts: 1573
Joined: Wed Sep 01, 2004 19:42

Re: Internet engine matches

Post by BertTuyt » Sat Dec 10, 2011 13:42

Jan-Jaap,

I assume/hope that also Michael (Dragon) reads this Forum.
It would be nice if he would be able to play a 158 games match against Kingsrow (but im not sure if he has this program).

Anyway, really missing is Damy !!
Gerard, as a regular reader and contributer, could you provide some insights with respect to Damy in relation with KingsRow....????

I still believe that Kingsrow is the best , Damy a good second, and then followed by Maximus/Damage.
But the proof of the pudding ......

By the way, in the Dutch news paper Telegraaf today a publication from Harm Wiersma about KingsRow (the best program in the world according Harm) , Maximus and Truus...

Speaking about Truus (and Flits), is Maximus now beating both programs during a match (I though you shared that Flits is the Maximus angstgegener..., or the other way around)

In the mean time I re-implemented a number of older ideas, which I switched off in the past.
As I (long ago) tested manually Damage against Truus, the new ideas happened not to make a major difference.

Also I had the habbit of implementing briljant ( :( ) ideas the day before the yearly tournament.
After many disappointments , crashes, and strange behavior, I switched off many options..

I more and more belief that there isnt a magic button which will created an ELO explosion, so you need to test (via zillions of test-games) all these ideas step by step..

With Xmas approaching im preparing a list of things i want to change/improve/whatever, so I will keep you posted.....

Bert

Ed Gilbert
Posts: 854
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Re: Internet engine matches

Post by Ed Gilbert » Sun Dec 11, 2011 22:01

The (big) difference is mainly due to reconsidering the KVO pattern, after observing that Maximus lost (too) many KVO games against Flits and Kingsrow.
Hi JJ,

I have forgotten what KVO means. Could you refresh my memory? Thanks.

-- Ed

BertTuyt
Posts: 1573
Joined: Wed Sep 01, 2004 19:42

Re: Internet engine matches

Post by BertTuyt » Sun Dec 11, 2011 22:37

Ed.

KVO = Korte Vleugel Opsluiting.

In a direct English translation Short Wing Lock (or something like that) :D

Bert

TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Re: Internet engine matches

Post by TAILLE » Sun Dec 11, 2011 23:08

Hi Bert,
BertTuyt wrote:Jan-Jaap,

I assume/hope that also Michael (Dragon) reads this Forum.
It would be nice if he would be able to play a 158 games match against Kingsrow (but im not sure if he has this program).

Anyway, really missing is Damy !!
Gerard, as a regular reader and contributer, could you provide some insights with respect to Damy in relation with KingsRow....????

I still believe that Kingsrow is the best , Damy a good second, and then followed by Maximus/Damage.
But the proof of the pudding ......

By the way, in the Dutch news paper Telegraaf today a publication from Harm Wiersma about KingsRow (the best program in the world according Harm) , Maximus and Truus...

Speaking about Truus (and Flits), is Maximus now beating both programs during a match (I though you shared that Flits is the Maximus angstgegener..., or the other way around)

In the mean time I re-implemented a number of older ideas, which I switched off in the past.
As I (long ago) tested manually Damage against Truus, the new ideas happened not to make a major difference.

Also I had the habbit of implementing briljant ( :( ) ideas the day before the yearly tournament.
After many disappointments , crashes, and strange behavior, I switched off many options..

I more and more belief that there isnt a magic button which will created an ELO explosion, so you need to test (via zillions of test-games) all these ideas step by step..

With Xmas approaching im preparing a list of things i want to change/improve/whatever, so I will keep you posted.....

Bert
Of all of us, I am certainly the more impatient to see what Damy will do against Kingsrow!!!

As you all now building a new version is a long trip and I have still to finalize some major developments. The major differences with my previous version are the following
1) A multithread engine (the previous version handled only 2 threads)
2) A search based more on MTD-f-best procedure than on MTD-f procedure
3) A very new pruning technique to avoid answering a bad move by another bad move (a rather difficult item but now it seems to work properly)
4) A new algorithm to handle loops (when it exist at least one king in each side)
5) A completely new approach for the evaluation function. In my previous version my evaluation function was rather complicated because it had to take into account both static considerations (control of center, active or inactive man, presence of arrow, overload of a pattern etc.) and dynamic considerations (weakness of an outpost against a repetive attack, breakthrough etc.). In my new version, static and dynamic evaluations are completely decoupled
6) The 8 pieces egdb
7) The DamExchange protocol (!)
8) A new GUI


I have still some works to do on points 3 and 5 above. After that I will begin matchs against Kingsrow for a debugging and a tuning activity.

BTW I intend also to come back to official tournaments in 2012.

Gérard
Gérard

jj
Posts: 190
Joined: Sun Sep 13, 2009 23:33
Real name: Jan-Jaap van Horssen
Location: Zeist, Netherlands

Re: Internet engine matches

Post by jj » Mon Dec 12, 2011 23:05

Ed Gilbert wrote:
The (big) difference is mainly due to reconsidering the KVO pattern, after observing that Maximus lost (too) many KVO games against Flits and Kingsrow.
Hi JJ,

I have forgotten what KVO means. Could you refresh my memory? Thanks.

-- Ed
Ed, KVO is called right wing lock in the Course in draughts. JJ
www.maximusdraughts.org

jj
Posts: 190
Joined: Sun Sep 13, 2009 23:33
Real name: Jan-Jaap van Horssen
Location: Zeist, Netherlands

Re: Internet engine matches

Post by jj » Mon Dec 12, 2011 23:27

TAILLE wrote:Of all of us, I am certainly the more impatient to see what Damy will do against Kingsrow!!!

As you all now building a new version is a long trip and I have still to finalize some major developments. The major differences with my previous version are the following
1) A multithread engine (the previous version handled only 2 threads)
2) A search based more on MTD-f-best procedure than on MTD-f procedure
3) A very new pruning technique to avoid answering a bad move by another bad move (a rather difficult item but now it seems to work properly)
4) A new algorithm to handle loops (when it exist at least one king in each side)
5) A completely new approach for the evaluation function. In my previous version my evaluation function was rather complicated because it had to take into account both static considerations (control of center, active or inactive man, presence of arrow, overload of a pattern etc.) and dynamic considerations (weakness of an outpost against a repetive attack, breakthrough etc.). In my new version, static and dynamic evaluations are completely decoupled
6) The 8 pieces egdb
7) The DamExchange protocol (!)
8) A new GUI


I have still some works to do on points 3 and 5 above. After that I will begin matchs against Kingsrow for a debugging and a tuning activity.

BTW I intend also to come back to official tournaments in 2012.
Gerard, that sounds impressive! It would also be very interesting (for you) to see the result of old Damy vs. new Damy.
Jan-Jaap
www.maximusdraughts.org

TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Re: Internet engine matches

Post by TAILLE » Tue Dec 13, 2011 00:06

jj wrote:
TAILLE wrote:Of all of us, I am certainly the more impatient to see what Damy will do against Kingsrow!!!

As you all now building a new version is a long trip and I have still to finalize some major developments. The major differences with my previous version are the following
1) A multithread engine (the previous version handled only 2 threads)
2) A search based more on MTD-f-best procedure than on MTD-f procedure
3) A very new pruning technique to avoid answering a bad move by another bad move (a rather difficult item but now it seems to work properly)
4) A new algorithm to handle loops (when it exist at least one king in each side)
5) A completely new approach for the evaluation function. In my previous version my evaluation function was rather complicated because it had to take into account both static considerations (control of center, active or inactive man, presence of arrow, overload of a pattern etc.) and dynamic considerations (weakness of an outpost against a repetive attack, breakthrough etc.). In my new version, static and dynamic evaluations are completely decoupled
6) The 8 pieces egdb
7) The DamExchange protocol (!)
8) A new GUI


I have still some works to do on points 3 and 5 above. After that I will begin matchs against Kingsrow for a debugging and a tuning activity.

BTW I intend also to come back to official tournaments in 2012.
Gerard, that sounds impressive! It would also be very interesting (for you) to see the result of old Damy vs. new Damy.
Jan-Jaap
I do not intend to play games between the old and the new Damy version because only the new version will have the DamExchange protocol. That was the major weakness of my old versions; I was never able to test an improvement via a match between versions.
BTW I build in my new version an interesting application in order to use Kingsrow in parallel with Damy: each time I change Damy board position I force Kingsrow to analyse the new position. That way I can more easily debug Damy evaluation function.
Gérard

BertTuyt
Posts: 1573
Joined: Wed Sep 01, 2004 19:42

Re: Internet engine matches

Post by BertTuyt » Thu Dec 15, 2011 22:13

Gerard, looking forward to see you back in 2012.
At least I will (most likely) join the Dutch open 2012.

Regarding your activities, which make a lot of sense to me the next questions/remarks:

1) Do you (as I do) use the YBWC algorithm, or did you implement a Taille special :D

2) So far I rely on the alpha-beta, with all the enhancements/improvements which you can find in computer games literature. Do you have a quantitative comparison between the alpha-beta and the MTD-f family?
In the computer chess blogs I see conflicting messages and some people (like Bob Hyat, Crafty) don't see any difference.

3) I also (so far) used the domain-independent techniques. Most recently i re-implemented (and im now examining this) a technique based on a specific Draughts rule, that a capture is forced, so with this the signature of bad moves becomes sometimes very transparent. Do you have any idea to what extend domain specific enhancement outperform the technique as used by the computer games community.

4) So far I recognize the need, but this has (not yet) a high priority

5) Fully agree. I also experimented with this begin 1990 when i wanted to beat Truus with machines searching 6-8 ply (or something like that). In the end i relied on a huge eval-function which is nowadays impossible to read , understand any more, and where maintenance is almost impossible. But I agree that the better approach is a direct communication between search and eval, where the eval can guide the search more then the static approach based on a return value and backtracking of scores...
Very much interested in your results.
In the past I experimented with the concept "super"-move which i also abandoned (but i found some amazing results with this !!)..

6) 8P , is something which I will do when my budget can deal with a dual processor 8-core each Ivy Bridge and a 500GByte SSD , so most likely 2013 :?

7) Already implemented , and a great help...


With kind regards,

Bert

TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Re: Internet engine matches

Post by TAILLE » Fri Dec 16, 2011 15:56

BertTuyt wrote:Gerard, looking forward to see you back in 2012.
At least I will (most likely) join the Dutch open 2012.

Regarding your activities, which make a lot of sense to me the next questions/remarks:

1) Do you (as I do) use the YBWC algorithm, or did you implement a Taille special :D

2) So far I rely on the alpha-beta, with all the enhancements/improvements which you can find in computer games literature. Do you have a quantitative comparison between the alpha-beta and the MTD-f family?
In the computer chess blogs I see conflicting messages and some people (like Bob Hyat, Crafty) don't see any difference.

3) I also (so far) used the domain-independent techniques. Most recently i re-implemented (and im now examining this) a technique based on a specific Draughts rule, that a capture is forced, so with this the signature of bad moves becomes sometimes very transparent. Do you have any idea to what extend domain specific enhancement outperform the technique as used by the computer games community.

4) So far I recognize the need, but this has (not yet) a high priority

5) Fully agree. I also experimented with this begin 1990 when i wanted to beat Truus with machines searching 6-8 ply (or something like that). In the end i relied on a huge eval-function which is nowadays impossible to read , understand any more, and where maintenance is almost impossible. But I agree that the better approach is a direct communication between search and eval, where the eval can guide the search more then the static approach based on a return value and backtracking of scores...
Very much interested in your results.
In the past I experimented with the concept "super"-move which i also abandoned (but i found some amazing results with this !!)..

6) 8P , is something which I will do when my budget can deal with a dual processor 8-core each Ivy Bridge and a 500GByte SSD , so most likely 2013 :?

7) Already implemented , and a great help...


With kind regards,

Bert
1) Yes I use the YBWC based algorithm with no major modifications
2) I perfectly know this “conflict” between the alpha-beta and the MTD-f fans. My view is the following: the MTD-f is far better than a pure alpha-beta procedure. But if you implement in your alpha-beta procedure a great part of the MTD-f procedure (I mean if you make an intensive use of null-windows) then I do not see major differences. Without any proof I prefer MTD-f procedure for the two following reasons:
a. I can easily take advantage of an MTD-f-best procedure especially in position where a move is almost forced
b. During a search at a given depth with a given test value, if I encounter a position already seen, I am pretty sure to find the position already resolved in the hash table for this depth and test value. I am not sure it is a great advantage with a search with few threads but I guess it could be very interesting with many search threads.
3) As we all know a lot of “domain-independent” techniques can be found in the literature. Seeing that some are good for chess and probably bad for draughts (e.g. the use the famous “null move”) I am wondering if such techniques could be called really “domain-independent”. LMR is another well known and efficient technique for chess. In draughts however the tree is less large than in chess and LMR technique seems less efficient. In addition, due to the highly tactical aspect of draughts LMR may become bad comparing to other techniques. For these reasons I do not use the LMR and built another technique based on recognition of (probably) bad moves.
4) The technique I use is draughts dependant
5) I understand what you mean when you write “eval can guide the search” but it is not exactly what I did. I just decoupled static and dynamic evaluation without any impact on the search. May be what you mention can be a future work!
6) In 2013 I guess I will be able to build the 9p db!
7) No additional comments
Gérard

TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Re: Internet engine matches

Post by TAILLE » Wed Mar 14, 2012 14:17

Hi,

I have just begun to play thematic games against Kingsrow (1.52) using the following initial position in a one-move ballot match (18 blitz games). Both programs are configured to use all their egdb (9 pieces for Kingsrow and 8 pieces for Damy) and to use 2 threads without pondering.

Image

Each match allow me to debug and tune my search algorithm and my eval function.
I promised to give you my very first results. Here they are:
Date 11/03/2012 Damy/kingsrow 0 win, 5 loss, 11 draws
Date 12/03/2012 Damy/kingsrow 0 win, 4 loss, 12 draws
Date 14/03/2012 Damy/kingsrow 3 wins, 5 loss, 10 draws
very encouraging indeed!
Gérard

BertTuyt
Posts: 1573
Joined: Wed Sep 01, 2004 19:42

Re: Internet engine matches

Post by BertTuyt » Wed May 09, 2012 22:57

Gerard,

any news regarding (thematic) Damy matches against KingsRow.
I play nowadays mini -matches (first 30 ballots) and i aim to play ( at least) 30 draws.
Last match 1 lost game, but it gave a clue :) for improvement, and thats the good news !!

Bert

Post Reply