Perft

Ed Gilbert · Post by **Ed Gilbert** » Wed Jul 16, 2008 15:18

I am using MS Visual Studio 2005 Professional Edition. For the project properties I am using whatever optimizations the standard 'release' build has. I think there are several additional optimizations that can be turned on to favor speed over size, omit frame pointers, enable intrinsics, etc., but I didn't try those.

I assume everyone else is using ghost squares in the bitboard representation? I'm using just a straightforward mapping of bits 0..49 for squares 1..50. I always thought the ghost squares should be faster, but I have too much code that I ported from 8x8 checkers so to make that change would be a lot of work for me.

-- Ed

BertTuyt · Post by **BertTuyt** » Wed Jul 16, 2008 19:48

I am using Visual Studio 2008 Professional Edition.
I never studied all optimization possibilities, only use optimize for speed nowadays.

I update quite some parameters during a Move, so this could affect speed.
In the case of Damage the timing of remove is virtually zero in time, as it only is a decrement of a stack pointer. For every new depth in the tree-search i update a stack structure (so maybe that is time consuming).

What I can do (and / or the others) to do a perft with double calls to the movegenerator (i think this is possible in the case of Damage), so we really could time only the impact of this routine.

Bert

TAILLE · Post by **TAILLE** » Wed Jul 16, 2008 21:23

Hi,
Just a very small example of optimisation issue.
Let's take the following code :

INT64 var;
...
var <<= 3;
var++;

the code assembly generated is the following :

mov rax,qword ptr [var]
shl rax,3
mov qword ptr [var],rax
mov rax,qword ptr [var]
add rax,1
mov rax,qword ptr [var]

obviously it would have been far better to have

mov rax,qword ptr [var]
shl rax,3
add rax,1
mov rax,qword ptr [var]

by the way the code

test = (test << 3) + 1;

generate :

mov rax,qword ptr [var]
lea rax,[rax*8]+1
mov rax,qword ptr [var]

which seems the best solution

What do you obtain with your development environment and speed optimisation ?

Gérard

TAILLE · Post by **TAILLE** » Wed Jul 16, 2008 21:41

TAILLE wrote:Hi,
Just a very small example of optimisation issue.
Let's take the following code :

INT64 var;
...
var <<= 3;
var++;

the code assembly generated is the following :

mov rax,qword ptr [var]
shl rax,3
mov qword ptr [var],rax
mov rax,qword ptr [var]
add rax,1
mov rax,qword ptr [var]

obviously it would have been far better to have

mov rax,qword ptr [var]
shl rax,3
add rax,1
mov rax,qword ptr [var]

by the way the code

test = (test << 3) + 1;

generate :

mov rax,qword ptr [var]
lea rax,[rax*8]+1
mov rax,qword ptr [var]

which seems the best solution

What do you obtain with your development environment and speed optimisation ?

Gérard

Correction of some copy/paste errors :

INT64 var;
...
var <<= 3;
var++;

the code assembly generated is the following :

mov rax,qword ptr [var]
shl rax,3
mov qword ptr [var],rax
mov rax,qword ptr [var]
add rax,1
mov qword ptr [var],rax

obviously it would have been far better to have

mov rax,qword ptr [var]
shl rax,3
add rax,1
mov qword ptr [var],rax

by the way the code

test = (test << 3) + 1;

generate :

mov rax,qword ptr [var]
lea rax,[rax*8]+1
mov qword ptr [var],rax

which seems the best solution

What do you obtain with your development environment and speed optimisation ?

Gérard

BertTuyt · Post by **BertTuyt** » Thu Jul 17, 2008 00:17

Did a Perft with multiple calls to the Movegenerator.
In average (for the initial position) the MoveGenerator consumes around 16.715 sec of the total 69.45 sec, so 24%.
Guess with Perft we mainly test the efficiency (and the additional book-keeping) of the Move and Unmove.

Will do a similar test with the 2nd position (where compared with Kingsrow Damage is realatively slow).

Bert

Ed Gilbert · Post by **Ed Gilbert** » Thu Jul 17, 2008 01:45

BertTuyt wrote:Did a Perft with multiple calls to the Movegenerator.
In average (for the initial position) the MoveGenerator consumes around 16.715 sec of the total 69.45 sec, so 24%.
Guess with Perft we mainly test the efficiency (and the additional book-keeping) of the Move and Unmove.

That makes no sense at all. The move/unmove cannot be 75% of the total perft time. Did you check that the node counts are correct on your second call to the move generator? I suspect somehow you did not measure what you think you did.

-- Ed

FeikeBoomstra · Post by **FeikeBoomstra** » Thu Jul 17, 2008 11:25

I tried the same:

from 130 sec -> 183 sec, so move generation only 53 sec, do_move : 77 sec

BertTuyt · Post by **BertTuyt** » Thu Jul 17, 2008 11:54

Ed, here my explanation. If I did not make a huge mistake in the Perft implementation , the number of DoMove/UndoMove is much larger then the number of MoveGenerator calls (as in the terminal-nodes no MoveGenerator is called, but every move to a terminal node is based on a Move and UndoMove).

I added some counters to verify this, and it seems (at least in my implementation) that the ratio between DoMove-UndoMove and MoveGenerators calls is a factor of 7 - 10.

Based on all this, I calculated average times for the 3 positions (keep in mind that I do some additional book-keeping for the DoMove such as tempo, material and hash update).

Note: UndoMove in my case is zero seconds, as i do a tree-information stack decrement, nothing else.

Position 1:
Average, MoveGen 44.96 nano-sec, DoMove 23.52 nano-sec.

Position 2:
Average, MoveGen 158.76 nano-sec, DoMove 26.47 nano-sec.

Position 3:
Average, MoveGen 63.20 nano-sec, DoMove 26.39 nano-sec.

Bert

Ed Gilbert · Post by **Ed Gilbert** » Thu Jul 17, 2008 12:36

Bert,

I looked again at the perft code, and now I see that you are right about the benchmarks. Perft seems useful for testing the correctness of the move generator, but not for measuring its speed. I think this confirms that the speed of the move generator has very little effect on the speed of a the search in a full draughts program.

-- Ed

BertTuyt · Post by **BertTuyt** » Thu Jul 17, 2008 13:14

Ed,

thanks for your reply. Anyway I'm still interested in the average timing of your MoveGenerator and DoMove and UndoMove.
I removed all non necessary book-keeping in the DoMove for the Perft, and I get now a 55.16 seconds for the initial position.

Bert

Ed Gilbert · Post by **Ed Gilbert** » Thu Jul 17, 2008 18:13

Bert,

I think if you want to use Perft for move generator benchmarks, then we have to make the modification which they call bulk counting at the Perft web page. The description of bulk counting also points this out, that it gives a better indication of move generator speed. I made the bulk counting modification, and now I get these results for your comparison.

{game start position}
perft -d11
perft(1) 9 nodes, 0.00 sec, 9 knodes/sec
perft(2) 81 nodes, 0.00 sec, 81 knodes/sec
perft(3) 658 nodes, 0.00 sec, 658 knodes/sec
perft(4) 4265 nodes, 0.00 sec, 4265 knodes/sec
perft(5) 27117 nodes, 0.00 sec, 27117 knodes/sec
perft(6) 167140 nodes, 0.00 sec, 167140 knodes/sec
perft(7) 1049442 nodes, 0.02 sec, 61732 knodes/sec
perft(8) 6483961 nodes, 0.14 sec, 45986 knodes/sec
perft(9) 41022423 nodes, 0.86 sec, 47645 knodes/sec
perft(10) 258895763 nodes, 5.30 sec, 48867 knodes/sec
perft(11) 1665861398 nodes, 32.80 sec, 50792 knodes/sec

{test position with 530 redundant moves at the start}
perft -d9 -f B:W6,9,10,11,20,21,22,23,30,K31,33,37,41,42,43,44,46:BK17,K24.
perft(1) 14 nodes, 0.00 sec, 14 knodes/sec
perft(2) 55 nodes, 0.00 sec, 55 knodes/sec
perft(3) 1168 nodes, 0.00 sec, 1168 knodes/sec
perft(4) 5432 nodes, 0.00 sec, 5432 knodes/sec
perft(5) 87195 nodes, 0.00 sec, 87195 knodes/sec
perft(6) 629010 nodes, 0.00 sec, 629010 knodes/sec
perft(7) 9041010 nodes, 0.14 sec, 64121 knodes/sec
perft(8) 86724219 nodes, 1.27 sec, 68448 knodes/sec
perft(9) 1216917193 nodes, 18.22 sec, 66790 knodes/sec

{Position from a Woldouby game}
perft -d15 -f W:W25,27,28,30,32,33,34,35,37,38:B12,13,14,16,18,19,21,23,24,26.
perft(1) 6 nodes, 0.00 sec, 6 knodes/sec
perft(2) 12 nodes, 0.00 sec, 12 knodes/sec
perft(3) 30 nodes, 0.00 sec, 30 knodes/sec
perft(4) 73 nodes, 0.00 sec, 73 knodes/sec
perft(5) 215 nodes, 0.00 sec, 215 knodes/sec
perft(6) 590 nodes, 0.00 sec, 590 knodes/sec
perft(7) 1944 nodes, 0.00 sec, 1944 knodes/sec
perft(8) 6269 nodes, 0.00 sec, 6269 knodes/sec
perft(9) 22369 nodes, 0.00 sec, 22369 knodes/sec
perft(10) 88050 nodes, 0.00 sec, 88050 knodes/sec
perft(11) 377436 nodes, 0.02 sec, 23590 knodes/sec
perft(12) 1910989 nodes, 0.06 sec, 29859 knodes/sec
perft(13) 9872645 nodes, 0.28 sec, 35009 knodes/sec
perft(14) 58360286 nodes, 1.60 sec, 36590 knodes/sec
perft(15) 346184885 nodes, 9.38 sec, 36922 knodes/sec

-- Ed

BertTuyt · Post by **BertTuyt** » Thu Jul 17, 2008 18:34

Ed,

implemented bulk counting.
For the initial position i got 26.91 sec (when i removed all book-keeping).
Will do the other 2 tests later, im working on endgame databases now.
Will post the time needed for 2 - 2 later (generation of 2m - 2m and all required other databases).

Bert

Ed Gilbert · Post by **Ed Gilbert** » Thu Jul 17, 2008 22:37

BertTuyt wrote:Ed,

implemented bulk counting.
For the initial position i got 26.91 sec (when i removed all book-keeping).

That is about the improvement I would expect from a move generator that uses ghost squares.

-- Ed

Rein Halbersma · Post by **Rein Halbersma** » Sat Aug 23, 2008 01:22

Ed Gilbert wrote: perft -d15 -f W:W25,27,28,30,32,33,34,35,37,38:B12,13,14,16,18,19,21,23,24,26.
perft(13) 9872645 nodes, 0.42 sec, 23340 knodes/sec
perft(14) 58360286 nodes, 2.33 sec, 25058 knodes/sec
perft(15) 346184885 nodes, 17.19 sec, 20140 knodes/sec

FeikeBoomstra wrote: perft(13) 9872744 nodes in 1.08 sec,
perft(14) 58361333 nodes in 5.94 sec,
perft(15) 346200814 nodes in 34.75 sec,

I don't think anyone mentioned this before, but it seems like Ed and Feike differ on the perft counts for the Woldouby position.

Over the last couple of weeks I have written my own move generator (bitboards with ghost squares, in C++). The numbers that my program generates agree perfectly with Ed's on all the positions mentioned in this thread, including the above Woldouby position. It were my efforts that Ed was referring to in the first post of this thread, and I'd like to thank Ed for making his perft program available to me so that I could test the correctness of my program. Just as a reference to solve the mismatch between Ed's & Feike's numbers for this position, here is the breakdown in the number of captures (yes, the speed is slow, but that's a 3-year old single-processor P4 3.2GHz with 32-bits WinXP):

Code: Select all

    
     .   .   .   .   .
   .   .   .   .   .
     .   b   b   b   .
   b   .   b   b   .
     b   .   b   b   w
   b   w   w   .   w
     .   w   w   w   w
   .   w   w   .   .
     .   .   .   .   .
   .   .   .   .   .

perft(13) 9872645 nodes, 1.223 sec, 8.07248 Mnodes/sec
non-capture moves: 8645051 (87.5657%)
1-fold captures: 578500 (5.85963%)
2-fold captures: 467405 (4.73434%)
3-fold captures: 134563 (1.36299%)
4-fold captures: 37357 (0.378389%)
5-fold captures: 7344 (0.0743874%)
6-fold captures: 2166 (0.0219394%)
7-fold captures: 259 (0.00262341%)
perft(14) 58360286 nodes, 6.438 sec, 9.06497 Mnodes/sec
non-capture moves: 51311436 (87.9218%)
1-fold captures: 3595079 (6.16015%)
2-fold captures: 2512863 (4.30578%)
3-fold captures: 682417 (1.16932%)
4-fold captures: 231141 (0.396059%)
5-fold captures: 25609 (0.0438809%)
6-fold captures: 1737 (0.00297634%)
7-fold captures: 4 (6.85398e-006%)
perft(15) 346184885 nodes, 41.135 sec, 8.41582 Mnodes/sec
non-capture moves: 307281704 (88.7623%)
1-fold captures: 17529551 (5.06364%)
2-fold captures: 14939712 (4.31553%)
3-fold captures: 4478556 (1.29369%)
4-fold captures: 1587644 (0.458612%)
5-fold captures: 309496 (0.0894019%)
6-fold captures: 52834 (0.0152618%)
7-fold captures: 4968 (0.00143507%)
8-fold captures: 420 (0.000121322%)

Rein Halbersma · Post by **Rein Halbersma** » Sat Aug 23, 2008 02:00

Ed Gilbert wrote:Gerard, yes that is good to hear that you have finished confirming all the WLD counts. That probably is a good verification of our move generators also, although it only tests for positions with a maximum of 5 pieces on one side. It is probably possible to have a bug that does not show up until more than 5 pieces are present on a side.
-- Ed

I can concur this observation. During the debugging of my move generator I found sometimes agreement with the posted numbers until depth=11 before there appeared any difference.

E.g., one mistake I made for a while was that I didn't clear a capturing king's from-square before I started to search for king captures. This gives correct move counts until you get a capture path that passes the initial from-square. This happens only if one can capture more than 4 pieces (so even after one has the duplicates removal working correctly!), but since 5-fold captures are very rare (and self-intersecting capture paths even rarer!), this bug doesn't show up in every position. Another silly bug that I had for a while was that I had the men moves/captures working correctly and also the king moves/captures, and even the promotions but I had forgotten to take care of the majority rule for men and kings combined. So if I found that a man could capture 2, but a king could capture 3, both moves would be kept. This bug showed up on depth=12 and the difference was 1 position on a total of several billion.

Something that I haven't experienced yet, but which is no doubt possible is that the total perft counts agree but the various n-fold captures disagree. Most probably this disagreement will show up in the next search depth, but what if you stopped before that level...

Of course, one can eliminate all such bugs by careful thinking or a lot of scrutiny afterwards, but rare bugs might not show up in test positions. Even the initial position is not a good test position because it takes until depth=11 before the first kings appear, so multiple king captures hardly get tested. And since really big multiple captures will rarely appear on the PV of a real search, alpha-beta will probably hide most of these bugs. My program e.g. does not crash from illegal moves (it just is doing bitwise operations, there are no array violations possible). So I would not be surprised if there were programs out there that don't have a 100% correct move generator but that still play very strongly in practice (and never make an illegal move in gameplay, or -if the bug occurs rarely enough- even miss a legal move during search).

World Draughts Forum

Perft

Example of optimisation problem

Re: Example of optimisation problem