perft for 8x8 checkers (depth 25)

AartBik · Post by **AartBik** » Thu Sep 13, 2012 02:06

Continuing my quest for deeper and deeper perft numbers for 8x8 checkers, I now computed depth 25 with the same distributed implementation I used earlier for depths up to 24. Below you see the perft breakdown per move (called "divide") from the initial position for various depths up to 25 (my depth 23 and 24 numbers were recently kindly confirmed by Murray Cash). The numbers were computed on on a cluster of machines, further optimized with a "hard collision"-free transposition table as well as bulk counting. Please note that the move generator does not eliminate duplicate captures (viz. the situation where a king can capture the same pieces in different directions; a situation that starts to occur at depth 12 and up).

Code: Select all

move      divide(18)     divide(19)     divide(20)      divide(21)       divide(22)       divide(23)        divide(24)          divide(25)
------------------------------------------------------------------------------------------------------------------------------------------
12-16:  550829166472  2517202147314 11531470109861  52945190026737  243598269855110 1123463594881857  5192042148594780   24019313789608561
11-16:  566149929068  2564849953998 11736729175821  53527954221225  246743868125768 1131373985922218  5248615918291379   24153215782987793
11-15:  435063007630  2041959240377  9515983205474  44775005468548  209016678583301  984253557821317  4602138522979438   21659601983574539
10-15:  472279451484  2180656975018 10055597639275  46574865098865  215412869777867 1000606302770349  4643700995955222   21609957136212495
10-14:  402570639569  1859042884028  8600202424158  39822944739732  184865466345796  856779998157523  3988937724259353   18496978526984076
 9-14:  441590753001  2068865301476  9698986164172  45530585259776  213736468971938 1003310451936358  4712325943133747   22101040287502927
 9-13:  625398758917  2881467090588 13406062152792  61923979665936  288999100078322 1337748969176591  6263620622082081   29027372375205409
------------------------------------------------------------------------------------------------------------------------------------------
       3493881706141 16114043592799 74545030871553 345100524480819 1602372721738102 7437536860666213 34651381875296000  161067479882075800

Rein Halbersma · Post by **Rein Halbersma** » Thu Sep 13, 2012 08:19

AartBik wrote:Continuing my quest for deeper and deeper perft numbers for 8x8 checkers, I now computed depth 25 with the same distributed implementation I used earlier for depths up to 24. Below you see the perft breakdown per move (called "divide") from the initial position for various depths up to 25 (my depth 23 and 24 numbers were recently kindly confirmed by Murray Cash). The numbers were computed on on a cluster of machines, further optimized with a "hard collision"-free transposition table as well as bulk counting. Please note that the move generator does not eliminate duplicate captures (viz. the situation where a king can capture the same pieces in different directions; a situation that starts to occur at depth 12 and up).
Code: Select all
move      divide(18)     divide(19)     divide(20)      divide(21)       divide(22)       divide(23)        divide(24)          divide(25)
------------------------------------------------------------------------------------------------------------------------------------------
12-16:  550829166472  2517202147314 11531470109861  52945190026737  243598269855110 1123463594881857  5192042148594780   24019313789608561
11-16:  566149929068  2564849953998 11736729175821  53527954221225  246743868125768 1131373985922218  5248615918291379   24153215782987793
11-15:  435063007630  2041959240377  9515983205474  44775005468548  209016678583301  984253557821317  4602138522979438   21659601983574539
10-15:  472279451484  2180656975018 10055597639275  46574865098865  215412869777867 1000606302770349  4643700995955222   21609957136212495
10-14:  402570639569  1859042884028  8600202424158  39822944739732  184865466345796  856779998157523  3988937724259353   18496978526984076
 9-14:  441590753001  2068865301476  9698986164172  45530585259776  213736468971938 1003310451936358  4712325943133747   22101040287502927
 9-13:  625398758917  2881467090588 13406062152792  61923979665936  288999100078322 1337748969176591  6263620622082081   29027372375205409
------------------------------------------------------------------------------------------------------------------------------------------
       3493881706141 16114043592799 74545030871553 345100524480819 1602372721738102 7437536860666213 34651381875296000  161067479882075800

Hi Aart,

Congratulations on the perft(25) result! You're taking a page out of Sergey Bubka's book by breaking the record by the smallest possible increments

BTW, with my current implementation (32-byte hash: 24-byte position, 59-bits count, 5-bits depth) I cannot improve this result because the counts would overflow the 59-bit counter. Having a mini-table per depth (so 64-bits for counting) would allow for computations up to perft(28). Alternatively, one could have distributed hash tables (one per thread) so that the count per node remains within 59-bits. How far does your current implementation scale?

Rein

AartBik · Post by **AartBik** » Thu Sep 13, 2012 19:33

Rein Halbersma wrote:BTW, with my current implementation (32-byte hash: 24-byte position, 59-bits count, 5-bits depth) I cannot improve this result because the counts would overflow the 59-bit counter. Having a mini-table per depth (so 64-bits for counting) would allow for computations up to perft(28). Alternatively, one could have distributed hash tables (one per thread) so that the count per node remains within 59-bits. How far does your current implementation scale?

Thanks Rein. I store full 64-bit counts, so I still have some room for scaling

Just curious, why do you use 24 bytes for a board position? It can be done in half that size!

murraycash · Post by **murraycash** » Thu Sep 13, 2012 19:53

AartBik wrote:Continuing my quest for deeper and deeper perft numbers for 8x8 checkers, I now computed depth 25 with the same distributed implementation I used earlier for depths up to 24. Below you see the perft breakdown per move (called "divide") from the initial position for various depths up to 25 (my depth 23 and 24 numbers were recently kindly confirmed by Murray Cash). The numbers were computed on on a cluster of machines, further optimized with a "hard collision"-free transposition table as well as bulk counting. Please note that the move generator does not eliminate duplicate captures (viz. the situation where a king can capture the same pieces in different directions; a situation that starts to occur at depth 12 and up).
Code: Select all
move      divide(18)     divide(19)     divide(20)      divide(21)       divide(22)       divide(23)        divide(24)          divide(25)
------------------------------------------------------------------------------------------------------------------------------------------
12-16:  550829166472  2517202147314 11531470109861  52945190026737  243598269855110 1123463594881857  5192042148594780   24019313789608561
11-16:  566149929068  2564849953998 11736729175821  53527954221225  246743868125768 1131373985922218  5248615918291379   24153215782987793
11-15:  435063007630  2041959240377  9515983205474  44775005468548  209016678583301  984253557821317  4602138522979438   21659601983574539
10-15:  472279451484  2180656975018 10055597639275  46574865098865  215412869777867 1000606302770349  4643700995955222   21609957136212495
10-14:  402570639569  1859042884028  8600202424158  39822944739732  184865466345796  856779998157523  3988937724259353   18496978526984076
 9-14:  441590753001  2068865301476  9698986164172  45530585259776  213736468971938 1003310451936358  4712325943133747   22101040287502927
 9-13:  625398758917  2881467090588 13406062152792  61923979665936  288999100078322 1337748969176591  6263620622082081   29027372375205409
------------------------------------------------------------------------------------------------------------------------------------------
       3493881706141 16114043592799 74545030871553 345100524480819 1602372721738102 7437536860666213 34651381875296000  161067479882075800

I tip my hat to you Aart, you've set me another mission impossible, do the same in (my own target of ) 4 days computer time!

Rein Halbersma · Post by **Rein Halbersma** » Thu Sep 13, 2012 22:30

AartBik wrote:
Rein Halbersma wrote:BTW, with my current implementation (32-byte hash: 24-byte position, 59-bits count, 5-bits depth) I cannot improve this result because the counts would overflow the 59-bit counter. Having a mini-table per depth (so 64-bits for counting) would allow for computations up to perft(28). Alternatively, one could have distributed hash tables (one per thread) so that the count per node remains within 59-bits. How far does your current implementation scale?
Thanks Rein. I store full 64-bit counts, so I still have some room for scaling
Just curious, why do you use 24 bytes for a board position? It can be done in half that size!

My template program uses "ghost" squares (much like Samuel's old program) for all variants to avoid even/odd row duplication of code. The downside is that it needs 3 times 64-bit bitboards for 8x8 checkers.

AartBik · Post by **AartBik** » Fri Sep 14, 2012 22:42

For comparison purposes, here are some single-threaded performance numbers of my generator on a 2.67GHz Intel Xeon:

plain version
optimized with bulk counting
further optimized with a hard-collisions free hash table

The nodes per second ratings are relative to perft count numbers, so get quite high for the larger problems.

I tried several hash strategies, but simply overwriting entries with new entries always performed best for me.
Do people have other experiences with this?

Code: Select all

                               plain                  bulk                  bulk+hash
----------------------------------------------------------------------------------------
perft(1) =           7        0ms.                    0ms.                  0ms.
perft(2) =          49        0ms.                    0ms.                  0ms.
perft(3) =         302        0ms.                    0ms.                  0ms.
perft(4) =        1469        0ms.                    0ms.                  0ms.
perft(5) =        7361        1ms.  7,361KN/s         0ms.                  0ms.
perft(6) =       36768        2ms. 18,384KN/s         1ms. 36,768KN/s       3ms.     12,256KN/s
perft(7) =      179740        7ms. 25,677KN/s         6ms. 29,956KN/s      11ms.     16,340KN/s
perft(8) =      845931       31ms. 27,288KN/s        20ms. 42,296KN/s      20ms.     42,296KN/s
perft(9) =     3963680      135ms. 29,360KN/s        66ms. 60,055KN/s      59ms.     67,181KN/s
perft(10)=    18391564      618ms. 29,759KN/s       319ms. 57,653KN/s     146ms.    125,969KN/s
perft(11)=    85242128    2,878ms. 29,618KN/s     1,455ms. 58,585KN/s     276ms.    308,848KN/s
perft(12)=   388623673   13,179ms. 29,488KN/s     6,819ms. 56,991KN/s     482ms.    806,273KN/s
perft(13)=  1766623630   60,634ms. 29,135KN/s    31,164ms. 56,688KN/s   1,292ms.  1,367,355KN/s
perft(14)=  7978439499  273,657ms. 29,154KN/s   143,178ms. 55,723KN/s   3,937ms.  2,026,527KN/s
perft(15)= 36263167175                                                 12,124ms.  2,991,023KN/s
perft(16)=165629569428                                                 45,123ms.  3,670,624KN/s

murraycash · Post by **murraycash** » Sat Sep 15, 2012 18:25

Here's my figures:

i7-M620 @ 2.67Ghz

EDIT: These first 2 tables had cumulative times originally by mistake. They now show independent times for each depth.

15/9/2012: Single thread, perft plain (no bulk counting)

1 ply, e = 7, 0.00s, 0 K Leaf Nodes/sec
2 ply, e = 49, 0.00s, 4 K Leaf Nodes/sec
3 ply, e = 302, 0.00s, 30 K Leaf Nodes/sec
4 ply, e = 1469, 0.00s, 146 K Leaf Nodes/sec
5 ply, e = 7361, 0.00s, 736 K Leaf Nodes/sec
6 ply, e = 36768, 0.00s, 3676 K Leaf Nodes/sec
7 ply, e = 179740, 0.01s, 8987 K Leaf Nodes/sec
8 ply, e = 845931, 0.02s, 28197 K Leaf Nodes/sec
9 ply, e = 3963680, 0.05s, 66061 K Leaf Nodes/sec
10 ply, e = 18391564, 0.18s, 96797 K Leaf Nodes/sec
11 ply, e = 85242128, 0.85s, 99118 K Leaf Nodes/sec
12 ply, e = 388623673, 3.99s, 97155 K Leaf Nodes/sec
13 ply, e = 1766623630, 18.81s, 93869 K Leaf Nodes/sec
14 ply, e = 7978439499, 84.08s, 94879 K Leaf Nodes/sec
15 ply, e = 36263167175, 387.43s, 93596 K Leaf Nodes/sec

15/9/2012: Single thread, plain + bulk

1 ply, e = 7, 0.00s, 0 K Leaf Nodes/sec
2 ply, e = 49, 0.00s, 4 K Leaf Nodes/sec
3 ply, e = 302, 0.00s, 30 K Leaf Nodes/sec
4 ply, e = 1469, 0.00s, 146 K Leaf Nodes/sec
5 ply, e = 7361, 0.00s, 736 K Leaf Nodes/sec
6 ply, e = 36768, 0.00s, 3676 K Leaf Nodes/sec
7 ply, e = 179740, 0.00s, 17974 K Leaf Nodes/sec
8 ply, e = 845931, 0.01s, 42296 K Leaf Nodes/sec
9 ply, e = 3963680, 0.04s, 79273 K Leaf Nodes/sec
10 ply, e = 18391564, 0.10s, 167196 K Leaf Nodes/sec
11 ply, e = 85242128, 0.43s, 193732 K Leaf Nodes/sec
12 ply, e = 388623673, 1.96s, 197270 K Leaf Nodes/sec
13 ply, e = 1766623630, 9.26s, 190574 K Leaf Nodes/sec
14 ply, e = 7978439499, 43.01s, 185458 K Leaf Nodes/sec
15 ply, e = 36263167175, 202.97s, 178653 K Leaf Nodes/sec

15/9/2012: Single thread + 4GB Hash table

5 ply, 7361, 00:00:00, 0.000 Bi Leaf Nodes/s
6 ply, 36768, 00:00:00, 0.001 Bi Leaf Nodes/s
7 ply, 179740, 00:00:00, 0.003 Bi Leaf Nodes/s
8 ply, 845931, 00:00:00, 0.012 Bi Leaf Nodes/s
9 ply, 3963680, 00:00:00, 0.040 Bi Leaf Nodes/s
10 ply, 18391564, 00:00:00, 0.108 Bi Leaf Nodes/s
11 ply, 85242128, 00:00:00, 0.213 Bi Leaf Nodes/s
12 ply, 388623673, 00:00:01, 0.363 Bi Leaf Nodes/s
13 ply, 1766623630, 00:00:03, 0.577 Bi Leaf Nodes/s
14 ply, 7978439499, 00:00:08, 0.887 Bi Leaf Nodes/s
15 ply, 36263167175, 00:00:25, 1.396 Bi Leaf Nodes/s
16 ply, 165629569428, 00:01:16, 2.169 Bi Leaf Nodes/s
17 ply, 758818810990, 00:03:38, 3.468 Bi Leaf Nodes/s

Aaart, Are your reported times independent per depth, or cumulative (include previous depths?) My times are independent per depth.

AartBik wrote:I tried several hash strategies, but simply overwriting entries with new entries always performed best for me.
Do people have other experiences with this?

Looking at the figures there is something very good about your hash table implementation, or something bad about mine!
I also tried "always overwrite" without improvement. In fact I tried every strategy on the chess wiki plus a few of my own. I ended up with the 2 bucket strategy by Thompson/Condon, one of the buckets is an "always replace". Memory latency plays a huge part here, in the hash table version, it spends 70% of all time resolving cache misses.

AartBik · Post by **AartBik** » Sat Sep 15, 2012 19:36

murraycash wrote:Aart, Are your reported times independent per depth, or cumulative (include previous depths?) My times are independent per depth.

Hi Murray,
Thanks for posting your numbers as well for comparison. Looks like you have a nice move generator! My reporting timings are independent per depth (but I don't clear the cache in between two different depths, although that does not seem to matter much).
Aart

Rein Halbersma · Post by **Rein Halbersma** » Wed Sep 19, 2012 10:02

AartBik wrote: I tried several hash strategies, but simply overwriting entries with new entries always performed best for me.
Do people have other experiences with this?

Hi Aart,

My code uses the undercut hash replacement scheme invented by H.G. Muller

Code: Select all

// partial specialization for the following 3-stage replacement scheme
template<typename Key, typename Value, int N>
struct insert_entry<Key, Value, N, EmptyOldUnderCutSmallestOfN>
{
        // typedefs
        typedef std::pair<Key, Value> Entry;
        typedef typename std::vector<Entry>::iterator Iterator;

        void operator()(Iterator bucket_begin, Entry const& entry) const
        {
                auto const bucket_end = bucket_begin + N;

                // replace any empty or old entry
                Key slots[] = { Key(0), entry.first };
                auto it = std::find_first_of(
                        bucket_begin, bucket_end,
                        std::begin(slots), std::end(slots),
                        [](Entry const& e, Key const& k)
                        { return e.first == k; }
                );
                if (it != bucket_end) {
                        *it = entry;
                        return;
                }

                // replace the first entry if the new depth undercuts the old depth by one
                if (entry.second.depth() == bucket_begin->second.depth() - 1) {
                        *bucket_begin = entry;
                        return;
                }

                // replace the smallest entry
                it = std::min_element(
                        bucket_begin, bucket_end,
                        [](Entry const& lhs, Entry const& rhs)
                        { return lhs.second.leafs() < rhs.second.leafs(); }
                );
                *it = entry;
        }
};

The above code represents a specialization for a general function object template with arbitrary replacement scheme. The function object gets an iterator to the start of a bucket (of 2 entries of 32-bytes in the case of perft hashing), and first looks for an empty slot or a slot containing the entry to be stored. If so, it will overwrite that entry. The second step is to look at the first entry and overwrite it when its depth is exactly one less as the entry to be stored. This is the "undercut" condition. It guides the hash table to have an approximately uniform distribution of depths. The final condition is to simply replace the entry with the lowest leaf count. This function object is being passed to the insert() member function of my hash table.

I haven't rigorously tested whether this really improves your always-replace scheme, or the depth-preferred and leaf count-preferred schemes with dual-sized buckets. Some small test runs didn't show much difference and I liked the undercut idea when I first wrote this code 2 years ago so I stuck with it. I also didn't test whether the actual distribution of depth is roughly uniform. So much for rigor, huh?

Rein

Rein Halbersma · Post by **Rein Halbersma** » Wed Sep 19, 2012 10:22

AartBik wrote:For comparison purposes, here are some single-threaded performance numbers of my generator on a 2.67GHz Intel Xeon:
plain version

optimized with bulk counting

further optimized with a hard-collisions free hash table
The nodes per second ratings are relative to perft count numbers, so get quite high for the larger problems.

Hi Aart,

The high NPS ratings are an artifact of the way you count. It's actually more convenient to do exactly as in a regular search, and update a counter immediately after entering the recursive routine. I did some measurments in this thread with this 2 years ago when Ed Trice asked about it. As Murray pointed out, the numbers you printed are "leafs per second", not "nodes per second". I count a node as a single call to the recursive routine. Hashing will decrease the NPS somewhat (because each call becomes more expensive), but overall time-to-depth will drop enormously.

In any case, my old timings on a P4 3.2Ghz + 1 Gb hash table were perft(16) = 149.60 seconds. Timings were independent per depth, and more than 3 times slower than your code (45 seconds), and almost 2 times slower than Murray's code (76 seconds).

It's fun to speculate about how fast your code would run on my machine. I take the passmark benchmark as a reference. My single-core P4 3.2 Ghz scores 522, Murray's dual-core i7 620M scores 2817. Perhaps Aart can specify the exact Xeon model, but let's take the very popular quad-core E5640 2.67Ghz with a benchmark score of 4947, or the hexa-core X5650 2.67 Ghz with a benchmark score of 7947. Translated to my single-core that would roughly give Murray's code 205 seconds (+35%) and Aart's code roughly 106-114 seconds (-25/30%). So I think our three move generators are all in the same ballpark

Rein

AartBik · Post by **AartBik** » Wed Sep 19, 2012 15:22

Rein Halbersma wrote:The high NPS ratings are an artifact of the way you count.

Yes, of course. My comment was meant to draw attention to that artifact, but reading it back I was indeed very unclear, so I am glad you added more explanation. Only the plain version has the expected (more or less) constant rating, the others go artificially high. I still like that "leaf per second" rating since it directly reflects the result of optimization. But in the end, only timings are probably the best measure for comparisons

AartBik · Post by **AartBik** » Wed Sep 19, 2012 15:24

Rein Halbersma wrote:I haven't rigorously tested whether this really improves your always-replace scheme, or the depth-preferred and leaf count-preferred schemes with dual-sized buckets. Some small test runs didn't show much difference and I liked the undercut idea when I first wrote this code 2 years ago so I stuck with it. I also didn't test whether the actual distribution of depth is roughly uniform. So much for rigor, huh?

In any case thanks for this background. I may play around with this and report back.

Rein Halbersma · Post by **Rein Halbersma** » Wed Sep 19, 2012 15:28

AartBik wrote:
Rein Halbersma wrote:I haven't rigorously tested whether this really improves your always-replace scheme, or the depth-preferred and leaf count-preferred schemes with dual-sized buckets. Some small test runs didn't show much difference and I liked the undercut idea when I first wrote this code 2 years ago so I stuck with it. I also didn't test whether the actual distribution of depth is roughly uniform. So much for rigor, huh?
In any case thanks for this background. I may play around with this and report back.

Can you specify which Xeon you were running? Just curious to compare the timings...

AartBik · Post by **AartBik** » Wed Sep 19, 2012 18:29

Rein Halbersma wrote:Can you specify which Xeon you were running? Just curious to compare the timings...

This was an Intel Xeon X5650 2.67GHz. On my checkers website you can also find some older performance numbers with just bulk counting on an Intel 2.2 GHz Core 2 Duo.

Rein Halbersma · Post by **Rein Halbersma** » Sat Dec 01, 2012 11:18

Below are perft() timings for the initial position in 8x8 checkers. My specs are a single thread on a Xeon E5-1650 @3.2Ghz, compiled to x64 with Visual C++ Nov 2012 CTP.

Plain with no hash updating of the position

Code: Select all

2>  info depth  1 leafs            7 nodes            8 time      0 nps       1 hashfull    0
2>  info depth  2 leafs           49 nodes           57 time      0 nps       1 hashfull    0
2>  info depth  3 leafs          302 nodes          359 time      0 nps       1 hashfull    0
2>  info depth  4 leafs         1469 nodes         1828 time      0 nps       1 hashfull    0
2>  info depth  5 leafs         7361 nodes         9189 time      0 nps       1 hashfull    0
2>  info depth  6 leafs        36768 nodes        45957 time      1 nps 45957000 hashfull    0
2>  info depth  7 leafs       179740 nodes       225697 time      6 nps 37616167 hashfull    0
2>  info depth  8 leafs       845931 nodes      1071628 time     30 nps 35720933 hashfull    0
2>  info depth  9 leafs      3963680 nodes      5035308 time    120 nps 41960900 hashfull    0
2>  info depth 10 leafs     18391564 nodes     23426872 time    560 nps 41833700 hashfull    0
2>  info depth 11 leafs     85242128 nodes    108669000 time   2628 nps 41350457 hashfull    0
2>  info depth 12 leafs    388623673 nodes    497292673 time  11398 nps 43629819 hashfull    0
2>  info depth 13 leafs   1766623630 nodes   2263916303 time  50248 nps 45054854 hashfull    0

Bulk counting with no hash updating of the position

Code: Select all

2>  info depth  1 leafs            7 nodes            1 time      0 nps       1 hashfull    0
2>  info depth  2 leafs           49 nodes            8 time      0 nps       1 hashfull    0
2>  info depth  3 leafs          302 nodes           57 time      0 nps       1 hashfull    0
2>  info depth  4 leafs         1469 nodes          359 time      0 nps       1 hashfull    0
2>  info depth  5 leafs         7361 nodes         1828 time      0 nps       1 hashfull    0
2>  info depth  6 leafs        36768 nodes         9189 time      0 nps       1 hashfull    0
2>  info depth  7 leafs       179740 nodes        45957 time      6 nps 7659500 hashfull    0
2>  info depth  8 leafs       845931 nodes       225697 time     16 nps 14106063 hashfull    0
2>  info depth  9 leafs      3963680 nodes      1071628 time     69 nps 15530841 hashfull    0
2>  info depth 10 leafs     18391564 nodes      5035308 time    313 nps 16087246 hashfull    0
2>  info depth 11 leafs     85242128 nodes     23426872 time   1455 nps 16100943 hashfull    0
2>  info depth 12 leafs    388623673 nodes    108669000 time   6303 nps 17240838 hashfull    0
2>  info depth 13 leafs   1766623630 nodes    497292673 time  29768 nps 16705613 hashfull    0

Bulk counting + hashing (1 Gb table, with 64-byte buckets of two 32-byte entries each)

Code: Select all

2>  info depth  1 leafs            7 nodes            1 time      0 nps       1 hashfull    0
2>  info depth  2 leafs           49 nodes            8 time      0 nps       1 hashfull    0
2>  info depth  3 leafs          302 nodes           57 time      0 nps       1 hashfull    0
2>  info depth  4 leafs         1469 nodes          359 time      0 nps       1 hashfull    0
2>  info depth  5 leafs         7361 nodes         1384 time      1 nps 1384000 hashfull    0
2>  info depth  6 leafs        36768 nodes         5175 time      2 nps 2587500 hashfull    0
2>  info depth  7 leafs       179740 nodes        17602 time      3 nps 5867333 hashfull    0
2>  info depth  8 leafs       845931 nodes        57799 time     10 nps 5779900 hashfull    1
2>  info depth  9 leafs      3963680 nodes       176781 time     32 nps 5524406 hashfull    4
2>  info depth 10 leafs     18391564 nodes       546545 time    100 nps 5465450 hashfull   11
2>  info depth 11 leafs     85242128 nodes      1654964 time    297 nps 5572269 hashfull   34
2>  info depth 12 leafs    388623673 nodes      5058595 time    960 nps 5269370 hashfull  101
2>  info depth 13 leafs   1766623630 nodes     15073122 time   2572 nps 5860467 hashfull  277
2>  info depth 14 leafs   7978439499 nodes     44935126 time   7616 nps 5900095 hashfull  611
2>  info depth 15 leafs  36263167175 nodes    133714610 time  22544 nps 5931273 hashfull  906
2>  info depth 16 leafs 165629569428 nodes    428903998 time  75462 nps 5683708 hashfull  998

Without hashing, the numbers are in the same ball park as Aart's, but apparently my hash table -as did Murray's- has a big performance hit (>2x

) compared to Aart's.

@Aart: can you tell us a bit about your hash table specs? How much memory, how many entries, which kind of data structure and replacement scheme?

@Murray, Aart: (assuming you have an incremental Zobrist hash) did you also turn off hash updating for the plain / bulk-counting perft() runs?

I'm using a plain std::vector of std::array<Entry, 2>, with one table for each side to move (could that be the big performance hit?), where an entry stores the node count/depth (8 bytes) and the position (24 bytes). I also tried storing only the hash index (8 bytes) but that didn't get a big gain.

World Draughts Forum

perft for 8x8 checkers (depth 25)

perft for 8x8 checkers (depth 25)

Re: perft for 8x8 checkers (depth 25)

Re: perft for 8x8 checkers (depth 25)

Re: perft for 8x8 checkers (depth 25)

Re: perft for 8x8 checkers (depth 25)

Re: perft for 8x8 checkers (depth 25)

Re: perft for 8x8 checkers (depth 25)

Re: perft for 8x8 checkers (depth 25)

Re: perft for 8x8 checkers (depth 25)

Re: perft for 8x8 checkers (depth 25)

Re: perft for 8x8 checkers (depth 25)

Re: perft for 8x8 checkers (depth 25)

Re: perft for 8x8 checkers (depth 25)

Re: perft for 8x8 checkers (depth 25)

Re: perft for 8x8 checkers (depth 25)