Internet engine matches

Discussion about development of draughts in the time of computer and Internet.
Post Reply
Rein Halbersma
Posts: 1722
Joined: Wed Apr 14, 2004 16:04
Contact:

Re: Internet engine matches

Post by Rein Halbersma »

Image

Here's another one: it's easy and my program finds it within 1 second (and less than 0.5 million nodes)
TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Re: Internet engine matches

Post by TAILLE »

Rein Halbersma wrote:Image

Here's another one: it's easy and my program finds it within 1 second (and less than 0.5 million nodes)
Yes Rein Damy needs also less than 1 second. The kind of combination is here very similar to the previous one isn't it?

BTW I easily recognize this position I proposed myself on this forum some time ago
Gérard
Rein Halbersma
Posts: 1722
Joined: Wed Apr 14, 2004 16:04
Contact:

Re: Internet engine matches

Post by Rein Halbersma »

TAILLE wrote:
Rein Halbersma wrote:Image

Here's another one: it's easy and my program finds it within 1 second (and less than 0.5 million nodes)
Yes Rein Damy needs also less than 1 second. The kind of combination is here very similar to the previous one isn't it?

BTW I easily recognize this position I proposed myself on this forum some time ago
I did not remember it, but after you showed the other position by Sijbrands, this one and the previous one popped up in my head again. This one is from GM Scholma IIRC, and it was published almost 20 years ago in Sijbrands's magazine "Dammen".

The last 2 positions are much easier (single/double sacrifice + move that makes a simple double threat) but the Sijbrands diagram is much harder because there it is single sacrifice + move with single threat, with after each reply a completely new and deep combination.
TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Re: Internet engine matches

Post by TAILLE »

Rein Halbersma wrote:
TAILLE wrote:Rein,
Rein Halbersma wrote:Image

Gerard, another challenge for your new algorithm: how long does it take to find the winning move? 8)
This problem seems far easier. I do not have the fractions of second for my time measure, but
Damy resolves the problem (34-30, 50-44) in less than 2 seconds. Do you have similar measure?
My program solves this position after search of 13 ply and a tree of 7.5 million nodes (= separate calls to search() function) and 7 seconds. My effective branching factor is around 3.38 for this position. What are some numbers for your search?
Oops I am not able to really answer your questions because we have obviously different definitions for the depth of a tree or for a node/leaf. Essentially, this is the consequence of several factors:
1) the extention/reduction/pruning mechanism make a single value for the depth of the tree completly irrelevant. In the above position the sequence 34-30 35x24 represents 2 plies but for Damy the depth is only one ply because of the forced second ply, and in the other hand Damy, and I guess it is the same for your programm (BTW what is the name of your program?), executes of course various reduction during the search.
2) even for a leaf of the tree we cannot have the same definition. As an exemple the Damy eval function executes some (recursive) micro searchs in order to discover a breakthrough or in order to discover a weak outpost (I perfectly know that some programmers prefer to build eval table. I tried also this approach but eventually I changed my mind). These microsearchs are very important in my implementation because they allow to discover a winning strategy by keeping the depth of the main tree as low as possible.

Anyway I do not want to elude your question.
For Damy the minimum number of plies to solve this problem is 6 plies
34-30 (1st ply) 35x24* 50-44 (2nd ply) xx-xx (3rd ply) 44-40 (4th ply) 45x34* 32-27 ou 32-28 (5th ply) x 43-39 (6th ply) x x
Due to my reduction mechanism Damy is unable to discover such sequence with an initial depth equal to 6. I have to wait for depth = 9 to see Damy discovering this winning sequence in less than 2 seconds. You can then easily conclude that some branches where reduced by at least 3 plies!

Could you also explain what you mean exactly by 13 plies in your implementation?
Gérard
BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Re: Internet engine matches

Post by BertTuyt »

As my machine was constantly working I was not able to post the Damage results regarding the positions posted.
So here is the first one.
Note timing based on a 2.93 Ghz i940, and with 1 core search ...
Hi,

After quite a very long time I have just managed to reach a quite stable new search algorithm for Damy.

Can you tell me how many time your program needs to resolve the following Sijbrands composition?

White to move : +1

[
After 0.02 sec Damage finds the right move ( 7 Ply search)
After 0.94 sec Damage also has the right score ( 14 Ply search )

Bert
BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Re: Internet engine matches

Post by BertTuyt »

And here the 1st Rein challenge:
Gerard, another challenge for your new algorithm: how long does it take to find the winning move? 8)
Damage finds the right Move and Score after 0.08 sec (Ply 10).

Bert
BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Re: Internet engine matches

Post by BertTuyt »

And the 2nd challenge of Rein
Here's another one: it's easy and my program finds it within 1 second (and less than 0.5 million nodes)
Damage finds the right move after 0.02 sec Ply 8 and around 30K nodes.

Bert
BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Re: Internet engine matches

Post by BertTuyt »

I didn't post for some time the reason i was playing Engine matches :)
After 2 won matches, i started to test IID, which was not a success.
Then I restarted with the old source (at least that was what I thought) to get a feeling for statistics.
But for an unknown reason Damage was not able to win a 158 games match anymore :(
Also the other matches revealed little differences
See below 8 Engine Matches.

I don't have a good explanation yet?
Based on code comparison , I could not find any clue so far.
What I also cant imagine is that the first 2 results were statistical fluctuations, or that Kingsrow sometimes has a bad day (or days), or that during initialization something weird can happen with Kingsrow with a lower strength as a result. Maybe Ed has more experience.
Anyway, the difference is not so dramatic, but I'm puzzled.

Code: Select all

Date            W    L    D    U    P     P%
16-sep-2012	  7	 3	148	0	162	51,3%
28-sep-2012	 14	 7	137	0	165	52,2%
1-Oct-2012	   4   11	143	0	151	47,8%
10-oct-2012     2	 8	148	0	152	48,1%
13-oct-2012     5	11	141	1	151	48,1%
16-oct-2002     4	11	143	0	151	47,8%
19-Oct-2012     3	 9	146	0	152	48,1%
23-Oct-2012     5	 9	143	1	153	48,7%

Bert
Rein Halbersma
Posts: 1722
Joined: Wed Apr 14, 2004 16:04
Contact:

Re: Internet engine matches

Post by Rein Halbersma »

BertTuyt wrote:I didn't post for some time the reason i was playing Engine matches :)
After 2 won matches, i started to test IID, which was not a success.
Then I restarted with the old source (at least that was what I thought) to get a feeling for statistics.
But for an unknown reason Damage was not able to win a 158 games match anymore :(
Also the other matches revealed little differences
See below 8 Engine Matches.

I don't have a good explanation yet?
Based on code comparison , I could not find any clue so far.
What I also cant imagine is that the first 2 results were statistical fluctuations, or that Kingsrow sometimes has a bad day (or days), or that during initialization something weird can happen with Kingsrow with a lower strength as a result. Maybe Ed has more experience.
Anyway, the difference is not so dramatic, but I'm puzzled.

Code: Select all

Date            W    L    D    U    P     P%
16-sep-2012	  7	 3	148	0	162	51,3%
28-sep-2012	 14	 7	137	0	165	52,2%
1-Oct-2012	   4   11	143	0	151	47,8%
10-oct-2012     2	 8	148	0	152	48,1%
13-oct-2012     5	11	141	1	151	48,1%
16-oct-2002     4	11	143	0	151	47,8%
19-Oct-2012     3	 9	146	0	152	48,1%
23-Oct-2012     5	 9	143	1	153	48,7%

Bert
Try and run the first 2 matches through BayesELo, compute rating difference, confidence interval and likelihood of superiority. Then repeat with all matches. You'll be surprised how big the rating uncertainty is from short (~300) matches with differences in the 10-15 ELO range.

UPDATE: just a quick calculation to confirm that. Based on the first 2 matches alone, Damage scored +12 ELO with an error margin of +/- 6 ELO (1.99 sigma result). That meant that Damage was 97.7% likely to be the stronger engine. Still that leaves a 2.3% chance that Kingsrow was stronger, so it's a reasonable but not a completely convincing show of superiority from Damage. But in the remaining matches, Kingsrow scored +14 ELO with an error margin of +/- 3.5 ELO (4.00 sigma result), and the likelihood that Kingsrow is superior is almost 100% (less than 1 in 30,000 chance that Damage is stronger).

You can also compute how likely it is that you win a 158 game match, given that Kingsrow in reality is 14 ELO stronger over 910 games. If I'm not mistaken, that probability was about 1 in a thousand (3.1 sigma result). So you were indeed very lucky to win the first 2 matches (if they were with identical versions).

Morale: statistically speaking, a 300 game match is better than a 9 game tournament, but luck can still influence the result. In particular, you need to test over a lot more games before you conclude with, say, 99% confidence that a positive match score shows that your program is superior. This is of course well known in the chess engine community.
MichelG
Posts: 244
Joined: Sun Dec 28, 2003 20:24
Contact:

Re: Internet engine matches

Post by MichelG »

Statistics is hard :-)

In fact, if you see any result anywhere that says statement X was proven with 95% (2 sigma) chance, then statement X is probably not true. Sadly this happens in a lot of fields, even in cancer research, because of the misuse of statistics.

Yesterday i called my mom:
me: mom, something remarkable happend!
mom: tell me!
me: all coins in my pocket have 2 heads!
mom: why do you think that?
me: i flipped 3 of the coins, and all 3 coins where heads! If coins had both head and tails, there would be 87.5% chance of at least turning up one tail.
mom: so you are only 87.5% sure that all coins in your pocket are heads?
me: remarkable isn't it?
mom: why don't you flip another coin and be more sure?
me: well, that's a lot of effort. And i already am very confident. 87.5% is a lot!
mom: would you have called me if your 3 filps where all tails?
me: ofcourse, then i would be 87.5% sure that all my coins where tails
mom: so there is a 1 in 4 chance that you would have called me?
me: i guess
mom: then there is a 1 in 4 chance that you would have found something remarkable isn't it? You can be only 75% sure that you found something.
me: i quess
mom: would you have called when it was 2 tails and 1 head?
me: no
mom: every time you call me about your coins, later it turns out to be just regular coins.
me: i just wanted to hear your voice...

Reins reasoning that there is only a very small chance of such a fluke is the right answer to the wrong question. Try answering this one:
how big is a chance, when playing a match of games that you end at an self-chosen point, gives a result that is remarkable?

I won't do the math, but it's goiing to be fairly big.
Rein Halbersma
Posts: 1722
Joined: Wed Apr 14, 2004 16:04
Contact:

Re: Internet engine matches

Post by Rein Halbersma »

MichelG wrote:Statistics is hard :-)

In fact, if you see any result anywhere that says statement X was proven with 95% (2 sigma) chance, then statement X is probably not true. Sadly this happens in a lot of fields, even in cancer research, because of the misuse of statistics.

Reins reasoning that there is only a very small chance of such a fluke is the right answer to the wrong question. Try answering this one:
how big is a chance, when playing a match of games that you end at an self-chosen point, gives a result that is remarkable?

I won't do the math, but it's goiing to be fairly big.
Michel,

What exactly are you trying to say? How would you propose to test for engine improvements? Do you see any role for statistics there?

Rein
MichelG
Posts: 244
Joined: Sun Dec 28, 2003 20:24
Contact:

Re: Internet engine matches

Post by MichelG »

Rein Halbersma wrote:
Michel,

What exactly are you trying to say? How would you propose to test for engine improvements? Do you see any role for statistics there?

Rein
The point is that a 95% significant statistic result does not mean that there is a 95% chance that something is true. Looking at bert's table for instance, i don't think you can conclude that the version of match 1 & 2 outperformed the version in the later matches.

The thing to learn is, that if you want to know if a change to your code is actually working, make sure to play enough games to get a high statistical confidence level (e.g 3 or 4 sigma, or 99.9%)

158 games is just not enough to 'prove' anything.
Ed Gilbert
Posts: 860
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Re: Internet engine matches

Post by Ed Gilbert »

The number of games needed to be confident of superiority depends on the relative strengths of the programs. 10 or 20 games is enough in the case of a severe mismatch. 158 games seems to be enough for kingsrow vs flits or truus. But when I test a new version of kingsrow vs a baseline, I use 7904 games (eight 3-move matches) and sometimes even that is not enough and I repeat it a few times.

-- Ed
BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Re: Internet engine matches

Post by BertTuyt »

Herewith the updated match table, with 3 additional matches added.
Still not found the previous magic win button, but some small minor modifications seem to work.

Code: Select all

Date            W    L    D    U    P     P%

16-sep-2012     7    3   148   0   162   51,3%
28-sep-2012    14    7   137   0   165   52,2%

1-Oct-2012      4   11   143   0   151   47,8%
10-oct-2012     2    8   148   0   152   48,1%
13-oct-2012     5   11   141   1   151   48,1%
16-oct-2002     4   11   143   0   151   47,8%
19-Oct-2012     3    9   146   0   152   48,1%
23-Oct-2012     5    9   143   1   153   48,7%

25-Oct-2012     2    5	150	1   154   49,0%
28-Oct-2012     5    8	145	0   155   49,1%
30-Oct-2012     3    5	150	0   156   49,4%

The last 3 matches seem to show an ELO difference below 10 (if my calculation is valid it was around 7)

Bert
Attachments
dxpgames Oct-2012 v9.pdn
(158.42 KiB) Downloaded 263 times
dxpgames Oct-2012 v8.pdn
(157.85 KiB) Downloaded 275 times
dxpgames Oct-2012 v7.pdn
(156.06 KiB) Downloaded 264 times
Rein Halbersma
Posts: 1722
Joined: Wed Apr 14, 2004 16:04
Contact:

Re: Internet engine matches

Post by Rein Halbersma »

BertTuyt wrote: After 2 won matches, i started to test IID, which was not a success.
Then I restarted with the old source (at least that was what I thought) to get a feeling for statistics.
Bert,

Do you have your program under version control? It should not be hard to get back the exact version that won the 2 matches.

Rein
Post Reply