Perft

TAILLE · Post by **TAILLE** » Tue Dec 09, 2008 22:23

Hi Pascal,

pascaljunq wrote:gerard,

is it possible to capture two different sets of pieces that have the same size with a draught (avec un pion) from the same initial square to the same final square ?

Rein proved (with previous diagram) that it is possible with a king

Pascal

I understand now. Here is an example

Gérard

TAILLE · Post by **TAILLE** » Tue Dec 09, 2008 22:53

Hi Pascal,

In addition, if you look for an example with the initial square identical to the final square you can take the following diagram :

Gérard

pascaljunq · Post by **pascaljunq** » Fri Dec 12, 2008 12:13

Very Clear

Thx

BertTuyt · Post by **BertTuyt** » Sun Dec 14, 2008 18:33

Based on all discussions, I am now convinced that basically the fastest method for Perft is the use of a BitBoard based MoveGenerator which generates new Positions.
I guess so far only Ed used this method.

A huge advantage of this method is that Moves are simple pointer increments and decrements.
To test this, I rewrote last week my BitBoard MoveGenerator.

A Position now consists of 5 BitBoards ( WhiteMan, WhiteKing, BlackMan, and BlackKing and Empty).
However during the MoveGeneration I only store the first 4 Bitboards, Empty is derived at the start of the MoveGeneration Process.
I could also base a Move on 3 Bitboards (WhitePiece, BlackPiece, and Kings), maybe (but the jury is not yet out) this is even faster (and used by Ed and Rein I think).

The proof of the pudding is in the eating, and here are for the standard 3 positions which we used so far, the Perft timings. Note: All without Bulk-count

1. 35.94 sec
2. 19.67 sec
3. 9.25 sec

The first 2 Perft times are roughly 7% slower compared with Ed (who uses similar hardware as I have. The third position is "more or lese" equal.

Maybe I will spent some time to optimize the routines a little, as I want to capture the speed-crown.

Another topic, but that's something for the X-mas holiday, is this implementation also the best for Alfa-beta search.

Keep you posted !

Bert

Rein Halbersma · Post by **Rein Halbersma** » Sun Dec 14, 2008 19:53

BertTuyt wrote:Based on all discussions, I am now convinced that basically the fastest method for Perft is the use of a BitBoard based MoveGenerator which generates new Positions.
I guess so far only Ed used this method.

A huge advantage of this method is that Moves are simple pointer increments and decrements.
To test this, I rewrote last week my BitBoard MoveGenerator.

A Position now consists of 5 BitBoards ( WhiteMan, WhiteKing, BlackMan, and BlackKing and Empty).
However during the MoveGeneration I only store the first 4 Bitboards, Empty is derived at the start of the MoveGeneration Process.
I could also base a Move on 3 Bitboards (WhitePiece, BlackPiece, and Kings), maybe (but the jury is not yet out) this is even faster (and used by Ed and Rein I think).

The proof of the pudding is in the eating, and here are for the standard 3 positions which we used so far, the Perft timings. Note: All without Bulk-count

1. 35.94 sec
2. 19.67 sec
3. 9.25 sec

The first 2 Perft times are roughly 7% slower compared with Ed (who uses similar hardware as I have. The third position is "more or lese" equal.

Maybe I will spent some time to optimize the routines a little, as I want to capture the speed-crown.

Bert

With Ed's method and without bulk-counting I get (run on Ed's quad)

1. 33.45 sec
2. 18.44 sec
3. 9.24 sec

Also note that Ed's timing of 31.something was *with* bulk-counting (the lack of ghost squares is slowing perft a bit down) . So you had the speed crown with this method, but now it's mine again

Rein

BertTuyt · Post by **BertTuyt** » Sun Dec 14, 2008 20:06

Rein, i am working on optimizations now.

I did not improve the position 1. timing, but for the other 2 positions the results so far are:

2. 19.53 sec
3. 9.14 sec

So at least for Position 3. I have the speed-crown [img]images/smilies/icon_smile.gif[/img]

Bert

BertTuyt · Post by **BertTuyt** » Mon Dec 15, 2008 20:26

Status Monday 15-December.
Perft timings(with bulk-counting)

1. 34.10 sec.
2. 18.42 sec.
3. 8.77 sec.

So 2. and 3. [img]images/smilies/icon_smile.gif[/img] 1. to go.

Keep you all posted.

Bert

BertTuyt · Post by **BertTuyt** » Tue Dec 16, 2008 21:25

Status Tuesday 16-December.
Perft timings(with bulk-counting)

1. 33.52 sec.
2. 18.20 sec.
3. 8.80 sec.

So 2. and 3. [img]images/smilies/icon_smile.gif[/img] and 1. getting closer

Keep you all posted.

Bert

BertTuyt · Post by **BertTuyt** » Wed Dec 17, 2008 20:18

Status Wednesday 17-December.
Perft timings(with bulk-counting)

1. 33.03 sec.
2. 18.21 sec.
3. 8.66 sec.

So 1. [img]images/smilies/icon_smile.gif[/img] 2. [img]images/smilies/icon_smile.gif[/img] 3. [img]images/smilies/icon_smile.gif[/img]

Bert

BertTuyt · Post by **BertTuyt** » Thu Apr 02, 2009 23:42

Status 2-April-2009, with my new computer ( Intel i7-940 2.93 GhZ)

The 3 standard test positions

1. 22.79 sec
2. 12.40 sec
3. 5.96 sec

Bert

Rein Halbersma · Post by **Rein Halbersma** » Fri Apr 03, 2009 12:01

BertTuyt wrote:Status 2-April-2009, with my new computer ( Intel i7-940 2.93 GhZ)

The 3 standard test positions

1. 22.79 sec
2. 12.40 sec
3. 5.96 sec

Bert

nice machine!!

Ed Gilbert · Post by **Ed Gilbert** » Fri Apr 03, 2009 14:51

Interesting results. Average speedup was 1.46. The ratio of clock frequencies is 1.22, so about 20% improvement was just from architectural changes. I will be interested to see how much the intrinsic popcount instruction speeds up an engine search.

-- Ed

BertTuyt · Post by **BertTuyt** » Mon Apr 13, 2009 20:28

Ed, i changed all iCount64() calls in my program into a __popcnt64() intrinsic.

This yielded a 2% speed-increase, not much, but at least, its for free [img]images/smilies/icon_smile.gif[/img]

Bert

Ed Gilbert · Post by **Ed Gilbert** » Tue Apr 14, 2009 02:55

BertTuyt wrote:Ed, i changed all iCount64() calls in my program into a __popcnt64() intrinsic.

This yielded a 2% speed-increase, not much, but at least, its for free

Bert

Thanks! I expected a bit more, but it's probably because you update some eval terms incrementally.

BertTuyt · Post by **BertTuyt** » Tue Apr 14, 2009 16:38

Ed, good remark.

I also think that when search depths increase, incremental update indeed might slow down the process.
Especially when there is a (relatively) cheap way like the new __popcnt64.

I will take a look at the code, and check if there are some parameters, which are now updated every new depth, are only needed in the end-nodes (so for evaluation purposes).

Keep you posted,

Bert