Standard set of test positions

Discussion about development of draughts in the time of computer and Internet.
Post Reply
gwiesenekker
Posts: 21
Joined: Sun Feb 20, 2011 21:04
Real name: Gijsbert Wiesenekker

Standard set of test positions

Post by gwiesenekker » Sun Feb 20, 2011 21:11

Hi,

Is there a standard set of test positions for computer draughts like the Bratko-Kopec set for computer chess?

Regards,
Gijsbert

Ed Gilbert
Posts: 859
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Re: Standard set of test positions

Post by Ed Gilbert » Sun Feb 20, 2011 21:19

Hi Gijsbert,

Nice to see that you have joined the forum.

I am not familiar with those chess test positions. We have a few test positions that we have been using with perft, to verify the move generator correctness, but I am not aware of any standard set of positions for testing complete engine performance.

When last I heard from you, you were generating the 7-piece dtw database for draughts. How is that going?

-- Ed

BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Re: Standard set of test positions

Post by BertTuyt » Sun Feb 20, 2011 21:35

There was a test set long ago :)
Guess it was around 1990 or something like that.
I'm not sure but it might be that Leo Nagels still has the set.
There was also a test report with all programs around that time, which was even published (if my memory is still correct).

Bert

gwiesenekker
Posts: 21
Joined: Sun Feb 20, 2011 21:04
Real name: Gijsbert Wiesenekker

Re: Standard set of test positions

Post by gwiesenekker » Sun Feb 20, 2011 22:15

Hi Ed,

The 4x3 have been calculated and the results are exactly the same as those from Michel Grimminck (http://www.xs4all.nl/~mdgsoft/draughts/stats/index.html) which is good as the algorithm is entirely different. The 5x2 are currently being calculated.

I have been using a set of 289 test positions I got from Klaas Bor (the author of Turbo Dambase) a couple of years ago. GWD declares a position as 'solved' if the score returned from the search is more than half a man score above the static evaluation score of the starting position. I have never checked if all 289 positions meet this criterion for the 'true' solution, but with this criterion GWD solves 276 out of 289 with a time limit of 10 seconds, 2 out of the remaining 13 with a time limit of 30 seconds, 1 out of the remaining 11 with a time limit op 90 seconds, and 1 out of the remaining 10 with a time limit of 270 seconds.
These are the 13 positions that remain after the 10 second search:

[FEN "W:W28,31,32,35,36,37,38,39,40,42,43,45,47:B3,7,8,11,12,13,15,19,20,21,23,26,29."]
[FEN "W:W27,28,32,35,38,40,43,44,45,47,48:B3,8,9,10,11,14,17,19,23,25,29."]
[FEN "W:W16,21,22,23,26,27,28,31,32,38,43,44,47,50:B2,4,6,7,8,9,10,12,13,24,25,30,35."]
[FEN "W:W21,24,29,31,34,36,37,42,48,49:B1,7,9,10,12,13,17,22,26,28."]
[FEN "W:W15,22,27,31,36,40,44,50:B1,2,4,7,19,23,35."]
[FEN "W:W15,20,21,24,27,34,35,40,48,49:B4,8,9,10,11,12,13,18,33,38."]
[FEN "W:W11,16,23,28,29,32,33,34,37,38,40,50:B2,3,4,6,7,9,12,13,15,18,36,45."]
[FEN "W:W12,19,23,24,26,32,40,41,42,44,47,48,50:B3,8,11,13,17,21,28,30,33,35,36,39,43."]
[FEN "W:W26,27,28,32,34,37,38,40,42,44,49,50:B1,7,9,10,11,13,16,19,20,23,24,25."]
[FEN "W:W26,28,30,34,37,39,41,42,43,44,45:B7,8,12,14,17,19,21,24,25,35,36."]
[FEN "W:W18,23,27,33,34,35,36,39,46,47,50:B1,6,7,8,10,17,20,24,25,37,45."]
[FEN "W:W26,32,33,35,38,41,43,49,50:B4,8,14,17,18,24,25,37."]
[FEN "W:W21,27,34,42,46,49:B8,11,16,18,23,24."]

I would be interested to know what your programs think of these positions.
Gijsbert

Rein Halbersma
Posts: 1722
Joined: Wed Apr 14, 2004 16:04
Contact:

Re: Standard set of test positions

Post by Rein Halbersma » Mon Feb 21, 2011 08:19

gwiesenekker wrote: I have been using a set of 289 test positions I got from Klaas Bor (the author of Turbo Dambase) a couple of years ago. GWD declares a position as 'solved' if the score returned from the search is more than half a man score above the static evaluation score of the starting position. I have never checked if all 289 positions meet this criterion for the 'true' solution, but with this criterion GWD solves 276 out of 289 with a time limit of 10 seconds, 2 out of the remaining 13 with a time limit of 30 seconds, 1 out of the remaining 11 with a time limit op 90 seconds, and 1 out of the remaining 10 with a time limit of 270 seconds.
Hi Gijsbert,

A good idea to have a test for search efficiency. I also have a correctness test in my program. From Michel Grimminck's website, I took all the longest winning positions for the endgame database for up to 4 pieces. Below is some sample code for the 2 vs 2 endgame. What it does is search a known database position to a depth equal to the database win length. My search distinguishes scores of win-in-N from longer or shorter wins, so I can do an assert() on the returned search score. My search passes all these correctness tests. With repetition checking turned on, there are some endgames in Killer draughts that give incorrect results, but I haven't experienced that for regular draughts.

Rein

Code: Select all

       
        typedef std::pair<std::string, size_t> DB_unittest;
        int value;
               
        DB_unittest DB_win22[] = {
                DB_unittest("W:W33,46:B4,5."     , 39), // 2020
                DB_unittest("W:W8,K50:B3,32."    , 27), // 1120
                DB_unittest("W:WK1,K23:B4,38."   , 25), // 0220
                DB_unittest("W:W17,35:B3,K21."   , 23), // 2011
                DB_unittest("W:WK1,12:B16,K50."  , 19), // 1111
                DB_unittest("W:WK1,K16:BK17,26." , 19), // 0211
                DB_unittest("W:W6,12:BK7,K45."   ,  7), // 2002
                DB_unittest("W:W6,K22:BK17,K50." ,  9), // 1102
                DB_unittest("W:WK6,K22:BK17,K50.",  9)  // 0202
        };

        for (size_t i = 0; i < 9; ++i) {
                value = Root::analyze<Variant::International>(read_position_string<FEN_tag>()(DB_win22[i].first), DB_win22[i].second);
                assert(value == Value::win(DB_win22[i].second));
        }

Ed Gilbert
Posts: 859
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Re: Standard set of test positions

Post by Ed Gilbert » Tue Feb 22, 2011 03:18

I would be interested to know what your programs think of these positions.
Here are some results with kingsrow using 4 search threads. Search scores are absolute, not relative to side to move. +100 means 1 black man advantage.

[FEN "W:W28,31,32,35,36,37,38,39,40,42,43,45,47:B3,7,8,11,12,13,15,19,20,21,23,26,29."]
{No advantage found.}

[FEN "W:W27,28,32,35,38,40,43,44,45,47,48:B3,8,9,10,11,14,17,19,23,25,29."]
{-50, 44-39, 4 seconds}

[FEN "W:W16,21,22,23,26,27,28,31,32,38,43,44,47,50:B2,4,6,7,8,9,10,12,13,24,25,30,35."]
{-248, 21-17, 2 seconds}

[FEN "W:W21,24,29,31,34,36,37,42,48,49:B1,7,9,10,12,13,17,22,26,28."]
{-74, 42-38, 13 seconds}

[FEN "W:W15,22,27,31,36,40,44,50:B1,2,4,7,19,23,35."]
{-120, 40-34, 1 second}

[FEN "W:W15,20,21,24,27,34,35,40,48,49:B4,8,9,10,11,12,13,18,33,38."]
{White db win, 49-43, 1 second}

[FEN "W:W11,16,23,28,29,32,33,34,37,38,40,50:B2,3,4,6,7,9,12,13,15,18,36,45."]
{-288, 50-44, 2 seconds}

[FEN "W:W12,19,23,24,26,32,40,41,42,44,47,48,50:B3,8,11,13,17,21,28,30,33,35,36,39,43."]
{White db win, 23-18, > 10 minutes}

[FEN "W:W26,27,28,32,34,37,38,40,42,44,49,50:B1,7,9,10,11,13,16,19,20,23,24,25."]
{-56, 38-33, 502 seconds}

[FEN "W:W26,28,30,34,37,39,41,42,43,44,45:B7,8,12,14,17,19,21,24,25,35,36."]
{170, 45-40, 1 second}

[FEN "W:W18,23,27,33,34,35,36,39,46,47,50:B1,6,7,8,10,17,20,24,25,37,45."]
{White wins in 39 plies, 6 seconds.}

[FEN "W:W26,32,33,35,38,41,43,49,50:B4,8,14,17,18,24,25,37."]
{db draw, 43-39, < 1 second}

[FEN "W:W21,27,34,42,46,49:B8,11,16,18,23,24."]
{db draw, 27-22, < 1 second}

-- Ed

Post Reply