Internet engine matches

Rein Halbersma · Post by **Rein Halbersma** » Fri Oct 12, 2012 16:28

Here's another one: it's easy and my program finds it within 1 second (and less than 0.5 million nodes)

TAILLE · Post by **TAILLE** » Fri Oct 12, 2012 16:58

Rein Halbersma wrote:

Here's another one: it's easy and my program finds it within 1 second (and less than 0.5 million nodes)

Yes Rein Damy needs also less than 1 second. The kind of combination is here very similar to the previous one isn't it?

BTW I easily recognize this position I proposed myself on this forum some time ago

Rein Halbersma · Post by **Rein Halbersma** » Fri Oct 12, 2012 17:19

TAILLE wrote:
Rein Halbersma wrote:

Here's another one: it's easy and my program finds it within 1 second (and less than 0.5 million nodes)
Yes Rein Damy needs also less than 1 second. The kind of combination is here very similar to the previous one isn't it?

BTW I easily recognize this position I proposed myself on this forum some time ago

I did not remember it, but after you showed the other position by Sijbrands, this one and the previous one popped up in my head again. This one is from GM Scholma IIRC, and it was published almost 20 years ago in Sijbrands's magazine "Dammen".

The last 2 positions are much easier (single/double sacrifice + move that makes a simple double threat) but the Sijbrands diagram is much harder because there it is single sacrifice + move with single threat, with after each reply a completely new and deep combination.

TAILLE · Post by **TAILLE** » Fri Oct 12, 2012 18:02

Rein Halbersma wrote:
TAILLE wrote:Rein,

Rein Halbersma wrote:

Gerard, another challenge for your new algorithm: how long does it take to find the winning move?
This problem seems far easier. I do not have the fractions of second for my time measure, but
Damy resolves the problem (34-30, 50-44) in less than 2 seconds. Do you have similar measure?
My program solves this position after search of 13 ply and a tree of 7.5 million nodes (= separate calls to search() function) and 7 seconds. My effective branching factor is around 3.38 for this position. What are some numbers for your search?

Oops I am not able to really answer your questions because we have obviously different definitions for the depth of a tree or for a node/leaf. Essentially, this is the consequence of several factors:
1) the extention/reduction/pruning mechanism make a single value for the depth of the tree completly irrelevant. In the above position the sequence 34-30 35x24 represents 2 plies but for Damy the depth is only one ply because of the forced second ply, and in the other hand Damy, and I guess it is the same for your programm (BTW what is the name of your program?), executes of course various reduction during the search.
2) even for a leaf of the tree we cannot have the same definition. As an exemple the Damy eval function executes some (recursive) micro searchs in order to discover a breakthrough or in order to discover a weak outpost (I perfectly know that some programmers prefer to build eval table. I tried also this approach but eventually I changed my mind). These microsearchs are very important in my implementation because they allow to discover a winning strategy by keeping the depth of the main tree as low as possible.

Anyway I do not want to elude your question.
For Damy the minimum number of plies to solve this problem is 6 plies
34-30 (1st ply) 35x24* 50-44 (2nd ply) xx-xx (3rd ply) 44-40 (4th ply) 45x34* 32-27 ou 32-28 (5th ply) x 43-39 (6th ply) x x
Due to my reduction mechanism Damy is unable to discover such sequence with an initial depth equal to 6. I have to wait for depth = 9 to see Damy discovering this winning sequence in less than 2 seconds. You can then easily conclude that some branches where reduced by at least 3 plies!

Could you also explain what you mean exactly by 13 plies in your implementation?

BertTuyt · Post by **BertTuyt** » Sat Oct 13, 2012 18:18

As my machine was constantly working I was not able to post the Damage results regarding the positions posted.
So here is the first one.
Note timing based on a 2.93 Ghz i940, and with 1 core search ...

Hi,

After quite a very long time I have just managed to reach a quite stable new search algorithm for Damy.

Can you tell me how many time your program needs to resolve the following Sijbrands composition?

White to move : +1

[

After 0.02 sec Damage finds the right move ( 7 Ply search)
After 0.94 sec Damage also has the right score ( 14 Ply search )

Bert

BertTuyt · Post by **BertTuyt** » Sat Oct 13, 2012 18:25

And here the 1st Rein challenge:

Gerard, another challenge for your new algorithm: how long does it take to find the winning move?

Damage finds the right Move and Score after 0.08 sec (Ply 10).

Bert

BertTuyt · Post by **BertTuyt** » Sat Oct 13, 2012 18:29

And the 2nd challenge of Rein

Here's another one: it's easy and my program finds it within 1 second (and less than 0.5 million nodes)

Damage finds the right move after 0.02 sec Ply 8 and around 30K nodes.

Bert

BertTuyt · Post by **BertTuyt** » Tue Oct 23, 2012 21:35

I didn't post for some time the reason i was playing Engine matches

After 2 won matches, i started to test IID, which was not a success.
Then I restarted with the old source (at least that was what I thought) to get a feeling for statistics.
But for an unknown reason Damage was not able to win a 158 games match anymore

Also the other matches revealed little differences
See below 8 Engine Matches.

I don't have a good explanation yet?
Based on code comparison , I could not find any clue so far.
What I also cant imagine is that the first 2 results were statistical fluctuations, or that Kingsrow sometimes has a bad day (or days), or that during initialization something weird can happen with Kingsrow with a lower strength as a result. Maybe Ed has more experience.
Anyway, the difference is not so dramatic, but I'm puzzled.

Code: Select all

Date            W    L    D    U    P     P%
16-sep-2012	  7	 3	148	0	162	51,3%
28-sep-2012	 14	 7	137	0	165	52,2%
1-Oct-2012	   4   11	143	0	151	47,8%
10-oct-2012     2	 8	148	0	152	48,1%
13-oct-2012     5	11	141	1	151	48,1%
16-oct-2002     4	11	143	0	151	47,8%
19-Oct-2012     3	 9	146	0	152	48,1%
23-Oct-2012     5	 9	143	1	153	48,7%

Bert

Rein Halbersma · Post by **Rein Halbersma** » Tue Oct 23, 2012 23:04

BertTuyt wrote:I didn't post for some time the reason i was playing Engine matches
After 2 won matches, i started to test IID, which was not a success.
Then I restarted with the old source (at least that was what I thought) to get a feeling for statistics.
But for an unknown reason Damage was not able to win a 158 games match anymore
Also the other matches revealed little differences
See below 8 Engine Matches.

I don't have a good explanation yet?
Based on code comparison , I could not find any clue so far.
What I also cant imagine is that the first 2 results were statistical fluctuations, or that Kingsrow sometimes has a bad day (or days), or that during initialization something weird can happen with Kingsrow with a lower strength as a result. Maybe Ed has more experience.
Anyway, the difference is not so dramatic, but I'm puzzled.
Code: Select all
Date            W    L    D    U    P     P%
16-sep-2012	  7	 3	148	0	162	51,3%
28-sep-2012	 14	 7	137	0	165	52,2%
1-Oct-2012	   4   11	143	0	151	47,8%
10-oct-2012     2	 8	148	0	152	48,1%
13-oct-2012     5	11	141	1	151	48,1%
16-oct-2002     4	11	143	0	151	47,8%
19-Oct-2012     3	 9	146	0	152	48,1%
23-Oct-2012     5	 9	143	1	153	48,7%
Bert

Try and run the first 2 matches through BayesELo, compute rating difference, confidence interval and likelihood of superiority. Then repeat with all matches. You'll be surprised how big the rating uncertainty is from short (~300) matches with differences in the 10-15 ELO range.

UPDATE: just a quick calculation to confirm that. Based on the first 2 matches alone, Damage scored +12 ELO with an error margin of +/- 6 ELO (1.99 sigma result). That meant that Damage was 97.7% likely to be the stronger engine. Still that leaves a 2.3% chance that Kingsrow was stronger, so it's a reasonable but not a completely convincing show of superiority from Damage. But in the remaining matches, Kingsrow scored +14 ELO with an error margin of +/- 3.5 ELO (4.00 sigma result), and the likelihood that Kingsrow is superior is almost 100% (less than 1 in 30,000 chance that Damage is stronger).

You can also compute how likely it is that you win a 158 game match, given that Kingsrow in reality is 14 ELO stronger over 910 games. If I'm not mistaken, that probability was about 1 in a thousand (3.1 sigma result). So you were indeed very lucky to win the first 2 matches (if they were with identical versions).

Morale: statistically speaking, a 300 game match is better than a 9 game tournament, but luck can still influence the result. In particular, you need to test over a lot more games before you conclude with, say, 99% confidence that a positive match score shows that your program is superior. This is of course well known in the chess engine community.

MichelG · Post by **MichelG** » Wed Oct 24, 2012 08:57

Statistics is hard

In fact, if you see any result anywhere that says statement X was proven with 95% (2 sigma) chance, then statement X is probably not true. Sadly this happens in a lot of fields, even in cancer research, because of the misuse of statistics.

Yesterday i called my mom:
me: mom, something remarkable happend!
mom: tell me!
me: all coins in my pocket have 2 heads!
mom: why do you think that?
me: i flipped 3 of the coins, and all 3 coins where heads! If coins had both head and tails, there would be 87.5% chance of at least turning up one tail.
mom: so you are only 87.5% sure that all coins in your pocket are heads?
me: remarkable isn't it?
mom: why don't you flip another coin and be more sure?
me: well, that's a lot of effort. And i already am very confident. 87.5% is a lot!
mom: would you have called me if your 3 filps where all tails?
me: ofcourse, then i would be 87.5% sure that all my coins where tails
mom: so there is a 1 in 4 chance that you would have called me?
me: i guess
mom: then there is a 1 in 4 chance that you would have found something remarkable isn't it? You can be only 75% sure that you found something.
me: i quess
mom: would you have called when it was 2 tails and 1 head?
me: no
mom: every time you call me about your coins, later it turns out to be just regular coins.
me: i just wanted to hear your voice...

Reins reasoning that there is only a very small chance of such a fluke is the right answer to the wrong question. Try answering this one:
how big is a chance, when playing a match of games that you end at an self-chosen point, gives a result that is remarkable?

I won't do the math, but it's goiing to be fairly big.

Rein Halbersma · Post by **Rein Halbersma** » Thu Oct 25, 2012 11:42

MichelG wrote:Statistics is hard

In fact, if you see any result anywhere that says statement X was proven with 95% (2 sigma) chance, then statement X is probably not true. Sadly this happens in a lot of fields, even in cancer research, because of the misuse of statistics.

Reins reasoning that there is only a very small chance of such a fluke is the right answer to the wrong question. Try answering this one:
how big is a chance, when playing a match of games that you end at an self-chosen point, gives a result that is remarkable?

I won't do the math, but it's goiing to be fairly big.

Michel,

What exactly are you trying to say? How would you propose to test for engine improvements? Do you see any role for statistics there?

Rein

MichelG · Post by **MichelG** » Thu Oct 25, 2012 15:23

Rein Halbersma wrote:
Michel,

What exactly are you trying to say? How would you propose to test for engine improvements? Do you see any role for statistics there?

Rein

The point is that a 95% significant statistic result does not mean that there is a 95% chance that something is true. Looking at bert's table for instance, i don't think you can conclude that the version of match 1 & 2 outperformed the version in the later matches.

The thing to learn is, that if you want to know if a change to your code is actually working, make sure to play enough games to get a high statistical confidence level (e.g 3 or 4 sigma, or 99.9%)

158 games is just not enough to 'prove' anything.

Ed Gilbert · Post by **Ed Gilbert** » Fri Oct 26, 2012 14:23

The number of games needed to be confident of superiority depends on the relative strengths of the programs. 10 or 20 games is enough in the case of a severe mismatch. 158 games seems to be enough for kingsrow vs flits or truus. But when I test a new version of kingsrow vs a baseline, I use 7904 games (eight 3-move matches) and sometimes even that is not enough and I repeat it a few times.

-- Ed

BertTuyt · Post by **BertTuyt** » Tue Oct 30, 2012 20:40

Herewith the updated match table, with 3 additional matches added.
Still not found the previous magic win button, but some small minor modifications seem to work.

Code: Select all

Date            W    L    D    U    P     P%

16-sep-2012     7    3   148   0   162   51,3%
28-sep-2012    14    7   137   0   165   52,2%

1-Oct-2012      4   11   143   0   151   47,8%
10-oct-2012     2    8   148   0   152   48,1%
13-oct-2012     5   11   141   1   151   48,1%
16-oct-2002     4   11   143   0   151   47,8%
19-Oct-2012     3    9   146   0   152   48,1%
23-Oct-2012     5    9   143   1   153   48,7%

25-Oct-2012     2    5	150	1   154   49,0%
28-Oct-2012     5    8	145	0   155   49,1%
30-Oct-2012     3    5	150	0   156   49,4%

The last 3 matches seem to show an ELO difference below 10 (if my calculation is valid it was around 7)

Bert

Rein Halbersma · Post by **Rein Halbersma** » Tue Oct 30, 2012 21:26

BertTuyt wrote: After 2 won matches, i started to test IID, which was not a success.
Then I restarted with the old source (at least that was what I thought) to get a feeling for statistics.

Bert,

Do you have your program under version control? It should not be hard to get back the exact version that won the 2 matches.

Rein

World Draughts Forum

Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches