is it possible to capture two different sets of pieces that have the same size with a draught (avec un pion) from the same initial square to the same final square ?
Rein proved (with previous diagram) that it is possible with a king
Based on all discussions, I am now convinced that basically the fastest method for Perft is the use of a BitBoard based MoveGenerator which generates new Positions.
I guess so far only Ed used this method.
A huge advantage of this method is that Moves are simple pointer increments and decrements.
To test this, I rewrote last week my BitBoard MoveGenerator.
A Position now consists of 5 BitBoards ( WhiteMan, WhiteKing, BlackMan, and BlackKing and Empty).
However during the MoveGeneration I only store the first 4 Bitboards, Empty is derived at the start of the MoveGeneration Process.
I could also base a Move on 3 Bitboards (WhitePiece, BlackPiece, and Kings), maybe (but the jury is not yet out) this is even faster (and used by Ed and Rein I think).
The proof of the pudding is in the eating, and here are for the standard 3 positions which we used so far, the Perft timings. Note: All without Bulk-count
1. 35.94 sec
2. 19.67 sec
3. 9.25 sec
The first 2 Perft times are roughly 7% slower compared with Ed (who uses similar hardware as I have. The third position is "more or lese" equal.
Maybe I will spent some time to optimize the routines a little, as I want to capture the speed-crown.
Another topic, but that's something for the X-mas holiday, is this implementation also the best for Alfa-beta search.
BertTuyt wrote:Based on all discussions, I am now convinced that basically the fastest method for Perft is the use of a BitBoard based MoveGenerator which generates new Positions.
I guess so far only Ed used this method.
A huge advantage of this method is that Moves are simple pointer increments and decrements.
To test this, I rewrote last week my BitBoard MoveGenerator.
A Position now consists of 5 BitBoards ( WhiteMan, WhiteKing, BlackMan, and BlackKing and Empty).
However during the MoveGeneration I only store the first 4 Bitboards, Empty is derived at the start of the MoveGeneration Process.
I could also base a Move on 3 Bitboards (WhitePiece, BlackPiece, and Kings), maybe (but the jury is not yet out) this is even faster (and used by Ed and Rein I think).
The proof of the pudding is in the eating, and here are for the standard 3 positions which we used so far, the Perft timings. Note: All without Bulk-count
1. 35.94 sec
2. 19.67 sec
3. 9.25 sec
The first 2 Perft times are roughly 7% slower compared with Ed (who uses similar hardware as I have. The third position is "more or lese" equal.
Maybe I will spent some time to optimize the routines a little, as I want to capture the speed-crown.
Bert
With Ed's method and without bulk-counting I get (run on Ed's quad)
1. 33.45 sec
2. 18.44 sec
3. 9.24 sec
Also note that Ed's timing of 31.something was *with* bulk-counting (the lack of ghost squares is slowing perft a bit down) . So you had the speed crown with this method, but now it's mine again
Interesting results. Average speedup was 1.46. The ratio of clock frequencies is 1.22, so about 20% improvement was just from architectural changes. I will be interested to see how much the intrinsic popcount instruction speeds up an engine search.
I also think that when search depths increase, incremental update indeed might slow down the process.
Especially when there is a (relatively) cheap way like the new __popcnt64.
I will take a look at the code, and check if there are some parameters, which are now updated every new depth, are only needed in the end-nodes (so for evaluation purposes).