Computer Draughts 2011

Discussion about development of draughts in the time of computer and Internet.
64_bit_checkers_engine
Posts: 62
Joined: Mon Apr 20, 2009 01:10

Re: Computer Draughts 2011

Post by 64_bit_checkers_engine »

I have two systems now, each over 5.0 GHz. My YouTube channel where you can see them:

http://www.youtube.com/user/LiquidNitro ... nSG29RIrbg

And here is my 12-core box at 4.5 GHz (Dual Xeon X5680 Westmere)

Image
BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Re: Computer Draughts 2011

Post by BertTuyt »

Eric van Dusseldorp has some interesting papers about computers on his website.
One of them I want to share with you, as it relates to the pattern detection discussion, and im curious how other programs deal with the position.

The name of the paper is "Hans Vermin uberlistete de Computer", 26-June-2010.
Uberlistete is a German word (so also not Dutch) which means something like outsmarts the computer.
The story is related to correspondence draughts, where due to the support of computers it is very difficult to win a game.
In a game played by Hans Vermin (and he also most likely uses computer for assistence) he was aware that a specific situation was completely misunderstood by Truus en Flits, who in a lost position thought the programs had a (huge) advantage.
So lets start with the base position.

Image

20…, 11-17?; 21. 29-24!, 6-11??; 22. 33-29!, 19-23; 23. 42-38, 23-28; 24. 38-32, 13-19; 25. 24x13, 8x19; 26. 32x23, 19x28; 27. 39-33!!, 28x39; 28. 34x43,…

The moves 11-17 and 6-11 by black are not good, altough both Flits and Truus think the program has here an advantage (interesting to check how other programs value this position).

28…, 14-20; 29. 30-25, 4-9; 30. 25x14, 9x20; 31. 35-30, 20-25; 32. 30-24, 3-9; 33. 49-44, 2-8;

And now the next position is reached.

Image

Now most programs see that black has a problem :)

34. 44-39, 8-13; 35. 39-34, 9-14; 36. 24-20, 13-19; 37. 20x9, 19-24; 38. 29x20, 25x3; 39. 34-30, 3-9; 40. 43-39, 9-14; 41. 30-25,

And now the loss of a man is there. You recognize the block of 8 man which is immobile due to the 6 white man, so basically on the other part of the board a 1 against 3 situation

Image

Think it would be nice if we are able to detect these types of patters..

Bert
Ed Gilbert
Posts: 860
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Re: Computer Draughts 2011

Post by Ed Gilbert »

Hi Bert,

It takes a few moves for kingsrow to realize that black has difficulties. I let it run an auto analysis of the pdn, with the result pasted below. It does not agree with some of the early moves, so perhaps it would have avoided the problem. For example at move 2 it likes 3-9, and the scores dropped from 4 to -6 when 6-11 was played in the game.

Code: Select all

[FEN "B:W27,29,30,31,33,34,35,36,37,39,41,42,46,48,49:B2,3,4,6,7,8,11,12,13,14,16,18,19,22,26"]
1. ...
   11-17 {value=4,  depth 23/27.6/47,  109.8s,  15765 kN/s,  pv 11-17 29-24 3-9 42-38 7-11 33-29 17-21 48-43 21x32 37x17 12x21}
2. 29-24 {value=4,  depth 24/28.6/48,  143.2s,  13477 kN/s,  pv 29-24 3-9 42-38 7-11 33-29 17-21 48-43 21x32 37x17 12x21 30-25}
   6-11 {value=4,  depth 24/28.9/48,  120.0s,  13745 kN/s,  pv 3-9 42-38 7-11 33-29 17-21 48-43 21x32 37x17 12x21 30-25 26x37}
3. 33-29 {value=-6,  depth 24/28.9/48,  208.9s,  9813 kN/s,  pv 33-29 4-10 42-38 19-23 48-43 10-15 38-33 23-28 30-25 13-19 24x13}
   19-23 {value=-6,  depth 25/29.6/50,  355.1s,  7275 kN/s,  pv 19-23 42-38 4-10 48-43 10-15 30-25 14-20 25x14 13-19 24x13 8x10}
4. 42-38 {value=-4,  depth 24/28.8/49,  152.6s,  11758 kN/s,  pv 42-38 4-10 48-43 10-15 30-25 14-20 25x14 13-19 24x13 8x10 35-30}
   23-28 {value=0,  depth 24/28.8/49,  226.1s,  9059 kN/s,  pv 4-10 38-33 23-28 30-25 13-19 24x13 18x9 27x18 12x23 33x22 17x28}
5. 38-32 {value=-10,  depth 23/28.6/49,  85.4s,  9268 kN/s,  pv 38-32 3-9 32x23 13-19 24x13 8x28 49-44 4-10 30-24 9-13 37-32}
   13-19 {value=-10,  depth 25/30.8/50,  462.7s,  2434 kN/s,  pv 3-9 32x23 13-19 24x13 8x28 49-44 4-10 30-24 9-13 37-32 28x37}
6. 24x13 {value=-14,  depth 24/29.4/47,  149.9s,  3689 kN/s,  pv 24x13 8x19 32x23 19x28 39-33 28x39 34x43 2-8 49-44 4-9 30-24}
   8x19 {value=-20,  depth 25/30.1/49,  157.7s,  4732 kN/s,  pv 8x19 32x23 19x28 39-33 28x39 34x43 2-8 49-44 3-9 29-24 4-10}
7. 32x23 {value=-20,  depth 5/11.1/21,  0.0s,  2607 kN/s,  pv 32x23 19x28 39-33 28x39 34x43 2-8 30-25}
   19x28 {value=3,  depth 5/9.2/20,  0.0s,  3706 kN/s,  pv 19x28 30-25 3-8 34-30 18-23 29x18}
8. 39-33 {value=-46,  depth 25/28.5/48,  308.7s,  6105 kN/s,  pv 39-33 28x39 34x43 2-8 49-44 14-20 44-39 20-25 37-32 26x28 39-33}
   28x39 {value=14,  depth 5/7.3/15,  0.0s,  743 kN/s,  pv 28x39 34x43 2-8 43-38 3-9 29-24}
9. 34x43 {value=20,  depth 5/8.3/17,  0.0s,  1328 kN/s,  pv 34x43 2-8 30-25 14-19 43-39}
   14-20 {value=-32,  depth 24/25.7/46,  100.5s,  10667 kN/s,  pv 2-8 49-44 4-9 30-24 8-13 43-39 14-19 35-30 9-14 30-25 19x30}
10. 30-25 {value=-58,  depth 24/27.2/46,  120.0s,  7405 kN/s,  pv 30-25 4-9 25x14 9x20 35-30 20-25 30-24 3-9 43-38 2-8 38-33}
    4-9 {value<-64,  depth 25/29.0/46,  461.9s,  5221 kN/s,  pv 4-9 25x14 9x20 35-30 20-25 30-24 2-8 49-44 3-9 44-40 8-13}
11. 25x14 {value=-66,  depth 5/7.2/13,  0.0s,  438 kN/s,  pv 25x14 9x20 49-44 2-8 44-40}
    9x20 {value=16,  depth 5/7.5/13,  0.0s,  1040 kN/s,  pv 9x20 35-30 20-25 30-24 2-8}
12. 35-30 {value=-84,  depth 25/27.7/45,  155.4s,  7420 kN/s,  pv 49-44 2-8 35-30 20-25 30-24 3-9 44-39 9-14 39-34 8-13 24-20}
    20-25 {value=-86,  depth 25/27.7/44,  135.7s,  7293 kN/s,  pv 20-25 30-24 2-8 49-44 3-9 44-39 9-14 39-34 8-13 24-20 13-19}
13. 30-24 {value=-86,  depth 27/30.4/46,  363.3s,  3057 kN/s,  pv 30-24 2-8 49-44 3-9 44-39 8-13 39-34 9-14 24-20 13-19 20x9}
    3-9 {value=-90,  depth 26/29.2/46,  132.8s,  7151 kN/s,  pv 3-9 49-44 2-8 44-39 9-14 39-34 8-13 24-20 13-19 20x9 19-24}
14. 49-44 {value=-92,  depth 26/29.0/46,  143.7s,  4591 kN/s,  pv 49-44 2-8 44-39 8-13 39-34 9-14 24-20 13-19 20x9 19-24 29x20}
    2-8 {value=-92,  depth 25/28.3/45,  93.4s,  4692 kN/s,  pv 2-8 44-39 8-13 39-34 9-14 24-20 13-19 20x9 19-24 29x20 25x3}
15. 44-39 {value=-92,  depth 25/28.2/44,  90.4s,  3065 kN/s,  pv 44-39 8-13 39-34 9-14 24-20 13-19 20x9 19-24 29x20 25x3 34-30}
    8-13 {value=-84,  depth 26/29.0/44,  186.8s,  2616 kN/s,  pv 8-13 39-34 9-14 24-20 13-19 20x9 19-24 29x20 25x3 43-39 3-9}
16. 39-34 {value=-84,  depth 26/29.1/45,  249.0s,  1618 kN/s,  pv 39-34 9-14 24-20 13-19 20x9 19-24 29x20 25x3 43-39 3-9 34-30}
    9-14 {value=-94,  depth 27/30.2/46,  433.1s,  1330 kN/s,  pv 9-14 24-20 13-19 20x9 19-24 29x20 25x3 34-30 16-21 27x16 3-9}
17. 24-20 {value=-98,  depth 27/30.5/46,  373.8s,  1430 kN/s,  pv 24-20 13-19 20x9 19-24 29x20 25x3 43-39 3-9 39-33 9-14 33-29}
    13-19 {value=-112,  depth 27/30.7/46,  475.4s,  2016 kN/s,  pv 13-19 20x9 19-24 29x20 25x3 43-39 16-21 27x16 18-23 48-42 23-29}
18. 20x9 {value=32,  depth 5/6.6/12,  0.0s,  129 kN/s,  pv 20x9 19-24 29x20 25x3 43-39 3-9}
    19-24 {value=-94,  depth 27/30.1/47,  120.3s,  2711 kN/s,  pv 19-24 29x20 25x3 34-30 3-9 43-39 9-14 30-25 16-21 27x16 18-23}
19. 29x20 {value=-70,  depth 5/7.0/11,  0.0s,  263 kN/s,  pv 29x20 25x3 34-30 16-21 27x16 18-23}
    25x3 {value=30,  depth 5/6.3/10,  0.0s,  316 kN/s,  pv 25x3 34-30 3-9 30-25 9-14}
20. 34-30 {value=-72,  depth 27/28.8/46,  124.9s,  2911 kN/s,  pv 43-39 3-9 34-30 9-14 30-25 16-21 27x16 18-23 48-42 12-18 31-27}
    3-9 {value=-76,  depth 28/29.5/47,  172.4s,  2857 kN/s,  pv 3-9 43-39 9-14 30-25 16-21 27x16 18-23 48-42 12-18 39-34 7-12}
21. 43-39 {value=-132,  depth 27/28.7/45,  95.8s,  3367 kN/s,  pv 43-39 9-14 30-25 16-21 27x16 18-23 48-42 12-18 31-27 22x31 36x27}
    9-14 {value=-76,  depth 29/30.8/49,  431.4s,  1319 kN/s,  pv 9-14 30-25 16-21 27x16 18-23 48-42 12-18 39-34 7-12 16x7 12x1}
22. 30-25 {value>-66,  depth 28/29.4/48,  426.4s,  1833 kN/s,  pv 30-25 16-21 27x16 18-23 48-42 12-18 31-27 22x31 36x27 17-22 41-36}
          {value=-58,  depth 28/31.5/48,  163.7s,  1416 kN/s,  pv 16-21 27x16 18-23 48-42 12-18 31-27 22x31 36x27 17-22 41-36 22x31}
Since the scores are dropping near the end, I continued playing out the game, and in just a few moves kingsrow sees a database draw. These entries are taken from the search log file, so the formatting is a little different than the auto pdn analysis.

Code: Select all

best 16-21, value -58, depth 28/31.5/48, nodes 231767196, time 163.74, 1416 kN/s, db 41542956, pv 16-21 27x16 18-23 48-42 12-18 31-27 22x31 36x27 17-22 41-36 22x31
best 27x16, value -88, depth 5/7.1/13, nodes 307, time 0.00, 307 kN/s, db 0, pv 27x16 18-23 48-42 22-27 31x22 17x28
best 18-23, value -62, depth 27/29.1/46, nodes 111615222, time 79.36, 1406 kN/s, db 20161265, pv 18-23 48-42 12-18 31-27 22x31 36x27 17-22 41-36 22x31 36x27 11-17
best 48-42, value -38, depth 27/29.3/46, nodes 166824521, time 131.25, 1271 kN/s, db 34915316, pv 48-42 12-18 31-27 22x31 36x27 17-22 41-36 22x31 36x27 14-19 37-32
best 12-18, value -36, depth 27/29.7/45, nodes 254082671, time 253.99, 1000 kN/s, db 62206988, pv 12-18 39-34 7-12 16x7 12x1 31-27 22x31 36x27 17-22 41-36 22x31
best 39-34, value -5, depth 27/29.7/46, nodes 359064540, time 352.06, 1020 kN/s, db 88234894, pv 39-34 7-12 16x7 12x1 31-27 22x31 36x27 17-22 41-36 22x31 36x27
best 7-12, value -5, depth 27/30.4/47, nodes 194439432, time 227.52, 855 kN/s, db 52712668, pv 7-12 16x7 12x1 31-27 22x31 36x27 17-22 41-36 22x31 36x27 23-28
This is the position where the draw was first seen.

Image
Black to move.

What does damage think about the game?

-- Ed
TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Re: Computer Draughts 2011

Post by TAILLE »

Hi Bert,

Image

This pattern is very interesting and the last stable version of Damy cannot evaluate correctly this pattern.
BTW this pattern can be compared with following well known one :

Image

I am working on a new Damy version able to detect such pattern but I am facing a big problem. I am able to recognize in the above pattern that only 6 white men blocked 8 black men but does that mean a real advantage to white? On average white has probably a small advantage but it largely depends on the configuration on the other wing and it is almost impossible to evaluate the intrinsec value of such pattern.

Now remove one black man (on 6, 7 or 11 square) in order to reach a pattern with 6 white men blocking 7 black men. This time, though white has still invested less men, the advantage is probably for black!

I continue to work hard on this difficult subject and I remain optimistic. I am pretty sure it is possible to greatly improve the eval function by an automatic pattern detection. At least it is my project for this 2011 year.
Gérard
Rein Halbersma
Posts: 1722
Joined: Wed Apr 14, 2004 16:04
Contact:

Re: Computer Draughts 2011

Post by Rein Halbersma »

TAILLE wrote:Hi Bert,

Image

This pattern is very interesting and the last stable version of Damy cannot evaluate correctly this pattern.
This pattern also featured in the 2002 match between GM Johan Krajenbrink against the computer program Flits. In the fifth game of the match, Krajenbrink had his biggest chance against Flits. Wieger Wesselink's analysis captures most of your points: http://10x10.dse.nl/analyse/Flits_-_Kra ... und_5.html
BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Re: Computer Draughts 2011

Post by BertTuyt »

Ed, here a short reply (as i just return from a course, and tomorrow ( = Saturday) i will fly to Shanghai).

Damage plays the same 2 moves as Flits and Truus (11-17 and 6-11), when I set my normal 10 seconds/move time control :(
The +score is around 0.2 (1.0 is a man).
During the next moves the score drops.

When white plays 39-33 the score according Damage is equal for both parties.
The 14-20 move from black is evaluated as -0.2
Damage sees for the first time that bad weather is approaching with white 30-25, here the score drops to -1/2 man (disadvantage for black).

I have to replay the game near the end with my 7P DB, but I remember that i also have seen that the score drops to below 1 man advantage, but i never encountered a draw score.

Bert
BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Re: Computer Draughts 2011

Post by BertTuyt »

First of all greetings from Shanghai :)

A question to all of you, i guess I already asked something similar before.
Does it make sense to standardise some elements in Draughts Programs?

As i know we sofar use 2 standards:
- PDN, think most programs are able to read PDN files.
- DamExchange Protocal , also (as I know) several programs are able to communicate via DXP. ( By the way: Gerard, as you also now seem to have DXP and Kingsrow, can yuo already share match results?? ).

Maybe we could agree to standardize more, and create more Open Formats like:
- OpeningBook
- Endgame DB's. Now with 8P DB's (and not everyone has installed HD's with multiple TeraBytes), it will become difficult if one want to store all programs with seperate DB's (like KingsRow, Damy, Dragon, and in a later stage Damage and others). Sharing and distribution of 8P DB's is already an issue, and sharing multiple is even worse..
- Engine Protocol, we briefly touched this in this forum, but so far we did not came to a final conclusion/proposal (Michel and I use GUIDE, but there are also reasons to choose for a WinBoard/Xboard implementation).
- Clipboard standardization. Several programs sent info to the clipboard like game and position, but I assume it is not possible so far to read this clipboard info trought another prograsm as formats are not standardized...

And guess there are more options..

I assume that for the real commercial programs de-standardization is the way to go, but for the others it would help to improve interoperability.

Bert
MichelG
Posts: 244
Joined: Sun Dec 28, 2003 20:24
Contact:

Re: Computer Draughts 2011

Post by MichelG »

Hello there in Shanghai :-)

I think some of the things you mention are instrinsic part of the program and would be very hard to standardize;

For the endgame databases for example, each programm makes has its own considerations; you might optimise it for small disk-size instead of high-speed access or visa versa. Or you might decide it should or should not contain positions that are in capture. Or you want lots of small files, or just the opposite. I think these considerations make it very difficult to make use of databases of another program.
BertTuyt wrote: - Clipboard standardization. Several programs sent info to the clipboard like game and position, but I assume it is not possible so far to read this clipboard info trought another prograsm as formats are not standardized...
I think this should be easy to standardize; we can use PDN for that. Just put a PDN or FEN string into the clipboard to copy positions or games.
BertTuyt wrote: - OpeningBook
I don't know about other programs, but mine is just a big PDN file with games that gets evaluated and converted into a binary format. Some option in the program 'convert PDN into opening book' would be sufficient to share the books.

Michel
BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Re: Computer Draughts 2011

Post by BertTuyt »

Michel,
For the endgame databases for example, each programm makes has its own considerations; you might optimise it for small disk-size instead of high-speed access or visa versa. Or you might decide it should or should not contain positions that are in capture. Or you want lots of small files, or just the opposite. I think these considerations make it very difficult to make use of databases of another program.
Basically you are right, but it does not mean that one can not agree to upon a specific choices and/or implementation.

If I remember well many programs used the 6P DB based on your work.
Im not sure if I capture history in the right way, but as I know the DB files Harm Jetten distributed for Dam 2.x were based on your previous work (as was also mentioned by Harm, so nothing secret).

Also Damage (in the earlier days) used this DB format.
So one can state that at that point in time the DDD consortium (Dam, Damage Dragon) has defined (more or less) a defacto standard :o
Also the current version of Horizon uses this DB-format (including the driver routines).

So despite all options, it is not impossible to agree upon a standard.
Next to that one can even develop a hybrid approach, so that the program is able to process multiple DB standards.

At least Im willing from my side to share, distribute the files and drivers (based on my owm DB-cache handler, so bypassing the Windows cache mechanism, as also Ed with KingsRow does).

We could also agree to use yours or whatever...

Bert
TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Re: Computer Draughts 2011

Post by TAILLE »

Hi Bert,
BertTuyt wrote: - DamExchange Protocal , also (as I know) several programs are able to communicate via DXP. ( By the way: Gerard, as you also now seem to have DXP and Kingsrow, can yuo already share match results?? ).
Bert
As I told you I implemented the DXP protocol only on my new/future Damy version. This version is the only one able to handle the 8 pieces db but it is also the version in which I rebuild from scratch all my evaluation function in order to base it on pattern automatic detection. The only match I ran against Kingsrow was a match in which the evaluation function of Damy was only based on material and, as expected, Kingrow win all its games.
BTW Damy is now able to recognize automatically a lot of blocked positions like the followings
Image Image
but a lot of works remains : it is not enough to recognize blocked position, we need to evaluate them correctly!
BertTuyt wrote:As i know we sofar use 2 standards:
Maybe we could agree to standardize more, and create more Open Formats like:
- Engine Protocol, we briefly touched this in this forum, but so far we did not came to a final conclusion/proposal (Michel and I use GUIDE, but there are also reasons to choose for a WinBoard/Xboard implementation).
Bert
I understand what you mean but Damy interface is very rich for at least to reasons:
1) Damy does not handle a game (a sequence of moves) but a tree with as much variations as you want
2) Damy includes a lot of functionnalities in order to help the user to use the egdb => special marking of the squares in addition to the basic fives states (wm, wk, bm, bk, empty).
More rich is an interface and more difficult it is to define a standardization.

Anyway it is interesting to exchange our ideas.
Gérard
TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

What is the meaning of an evaluation?

Post by TAILLE »

Hi,

In a lot of place in this forum we can see results of analysis in the form of values representing the evaluation of a position.

As an exemple you can take a recent post of Ed.:
Ed Gilbert wrote:

Code: Select all

[FEN "B:W27,29,30,31,33,34,35,36,37,39,41,42,46,48,49:B2,3,4,6,7,8,11,12,13,14,16,18,19,22,26"]
1. ...
   11-17 {value=4,  depth 23/27.6/47,  109.8s,  15765 kN/s,  pv 11-17 29-24 3-9 42-38 7-11 33-29 17-21 48-43 21x32 37x17 12x21}
2. 29-24 {value=4,  depth 24/28.6/48,  143.2s,  13477 kN/s,  pv 29-24 3-9 42-38 7-11 33-29 17-21 48-43 21x32 37x17 12x21 30-25}
   6-11 {value=4,  depth 24/28.9/48,  120.0s,  13745 kN/s,  pv 3-9 42-38 7-11 33-29 17-21 48-43 21x32 37x17 12x21 30-25 26x37}
3. 33-29 {value=-6,  depth 24/28.9/48,  208.9s,  9813 kN/s,  pv 33-29 4-10 42-38 19-23 48-43 10-15 38-33 23-28 30-25 13-19 24x13}
   19-23 {value=-6,  depth 25/29.6/50,  355.1s,  7275 kN/s,  pv 19-23 42-38 4-10 48-43 10-15 30-25 14-20 25x14 13-19 24x13 8x10}
4. 42-38 {value=-4,  depth 24/28.8/49,  152.6s,  11758 kN/s,  pv 42-38 4-10 48-43 10-15 30-25 14-20 25x14 13-19 24x13 8x10 35-30}
   23-28 {value=0,  depth 24/28.8/49,  226.1s,  9059 kN/s,  pv 4-10 38-33 23-28 30-25 13-19 24x13 18x9 27x18 12x23 33x22 17x28}
5. 38-32 {value=-10,  depth 23/28.6/49,  85.4s,  9268 kN/s,  pv 38-32 3-9 32x23 13-19 24x13 8x28 49-44 4-10 30-24 9-13 37-32}
   13-19 {value=-10,  depth 25/30.8/50,  462.7s,  2434 kN/s,  pv 3-9 32x23 13-19 24x13 8x28 49-44 4-10 30-24 9-13 37-32 28x37}
...
-- Ed
In order to compare results of different programs one can imagine a kind of standardization of an evaluation results but my first question is the following : what means for you the value returned by the evaluation function?

Let's suppose that the result given by the evaluation function for an advanatge of one man in the beginnig of a game is +100
Image
value = +100 (for black point of view)
Let's now suppose that with the starting position above, a match in 1000 games gives 900 wins, 100 draws and 0 loss (for black point of view) => score = 1,90
Does that mean that a value of +100 means that we expect a 1,90 score ?
In other word do you consider that, for each program, it should exist a bijection function which can translate the value given by the evaluation function in an "expected score" ?

If yes then the standardization is there because an "expected score" is a well understandable notion. Each programmer as only to invent the corresponding bijection function and we reach a common language.
Gérard
MichelG
Posts: 244
Joined: Sun Dec 28, 2003 20:24
Contact:

Re: What is the meaning of an evaluation?

Post by MichelG »

TAILLE wrote:Hi,
value = +100 (for black point of view)
Let's now suppose that with the starting position above, a match in 1000 games gives 900 wins, 100 draws and 0 loss (for black point of view) => score = 1,90
Does that mean that a value of +100 means that we expect a 1,90 score ?
In other word do you consider that, for each program, it should exist a bijection function which can translate the value given by the evaluation function in an "expected score" ?

If yes then the standardization is there because an "expected score" is a well understandable notion. Each programmer as only to invent the corresponding bijection function and we reach a common language.
You can establish the bijection function by playing lots of random games and performing some statistics on it. I haven't done this for dragon yet, but i will later. I have a suspicion that +100 at the start is a lot better then having +100 in the endgame and that should not be the case.

However, i see a huge problem. The expected score is highly dependent of the search depth; if you play a match with a 1 ply search, it's going to randomize the game will end up 500-100-400 or so.

Search 20 plies and black will win the given position all the time.

I think it would be difficult to compare evaluations this way. In the given position you would give it an expected score of 1.9, when played out by program X at level Y.

So you get

expected_score=bijection(evaluation at N ply, N, match play at level Y, program)

Not very pretty.

Michel
TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Re: What is the meaning of an evaluation?

Post by TAILLE »

MichelG wrote:
TAILLE wrote:Hi,
value = +100 (for black point of view)
Let's now suppose that with the starting position above, a match in 1000 games gives 900 wins, 100 draws and 0 loss (for black point of view) => score = 1,90
Does that mean that a value of +100 means that we expect a 1,90 score ?
In other word do you consider that, for each program, it should exist a bijection function which can translate the value given by the evaluation function in an "expected score" ?

If yes then the standardization is there because an "expected score" is a well understandable notion. Each programmer as only to invent the corresponding bijection function and we reach a common language.
You can establish the bijection function by playing lots of random games and performing some statistics on it. I haven't done this for dragon yet, but i will later. I have a suspicion that +100 at the start is a lot better then having +100 in the endgame and that should not be the case.

However, i see a huge problem. The expected score is highly dependent of the search depth; if you play a match with a 1 ply search, it's going to randomize the game will end up 500-100-400 or so.

Search 20 plies and black will win the given position all the time.

I think it would be difficult to compare evaluations this way. In the given position you would give it an expected score of 1.9, when played out by program X at level Y.

So you get

expected_score=bijection(evaluation at N ply, N, match play at level Y, program)

Not very pretty.

Michel
I think we must forget about the depth search problem. With an engine able to search at depth 200 plies we will always reach the exact result don't we?
My question was really related to the "static" evaluation, I mean the evaluation used for a stable position on a leaf of the tree. For the operator point of view it as to know what means +100, +200 or +300 values.
If you think that there exists a bijection between an evaluation and an expected score then the standardization is quite simple.

Proposal for a standardization :
+100 : corresponds to the expected score of a game with a "standard" (no compensation for the man lost) 20x19 start position
+200 : corresponds to the expected score of a game with a "standard" 20x18 start position
etc.

You may decide in your program that a game beginning with a "standard" 15x14 start position as a value like +97 but that's your problem.
As far as standardization is concerned why not trying to use the proposal above?

Warning : as defined above a value +200 is surely a win. If you agree with such standardization that means that, whatever the number of men remaining on the board, a +200 value must always be surely a win. If not that means that you do not have a bijection between your evaluation and an expected score and you have then to explain what your evaluation means exactly.
Gérard
Post Reply