Perft

BertTuyt · Post by **BertTuyt** » Fri Jul 22, 2016 14:07

I also run the Perft 11 for the initial position on my faster computer.
Herewith the results.


Perft(1)	N = 9	   0.00 sec.	KN/sec = 0
Perft(2)	N = 81	   0.00 sec.	KN/sec = 0
Perft(3)	N = 658	   0.00 sec.	KN/sec = 0
Perft(4)	N = 4265	   0.00 sec.	KN/sec = 0
Perft(5)	N = 27117	   0.00 sec.	KN/sec = 0
Perft(6)	N = 167140	   0.00 sec.	KN/sec = 83570
Perft(7)	N = 1049442	   0.01 sec.	KN/sec = 131180
Perft(8)	N = 6483961	   0.05 sec.	KN/sec = 144088
Perft(9)	N = 41022423	   0.25 sec.	KN/sec = 165413
Perft(10)  N = 258895763	   1.55 sec.	KN/sec = 166814
Perft(11)  N = 1665861398	   9.74 sec.	KN/sec = 170997

Bert

BertTuyt · Post by **BertTuyt** » Fri Jul 22, 2016 17:26

And the position from the Woldouby game.

Code: Select all

Perft(1)	 N = 6	   0.00 sec.	KN/sec = 0
Perft(2)	 N = 12	   0.00 sec.	KN/sec = 0
Perft(3)	 N = 30	   0.00 sec.	KN/sec = 0
Perft(4)	 N = 73	   0.00 sec.	KN/sec = 0
Perft(5)	 N = 215	   0.00 sec.	KN/sec = 0
Perft(6)	 N = 590	   0.00 sec.	KN/sec = 0
Perft(7)	 N = 1944	   0.00 sec.	KN/sec = 0
Perft(8)	 N = 6269	   0.00 sec.	KN/sec = 0
Perft(9)	 N = 22369	   0.00 sec.	KN/sec = 22369
Perft(10)	N = 88050	   0.00 sec.	KN/sec = 88050
Perft(11)	N = 377436	   0.01 sec.	KN/sec = 53919
Perft(12)	N = 1910989	   0.03 sec.	KN/sec = 70777
Perft(13)	N = 9872645	   0.13 sec.	KN/sec = 75363
Perft(14)	N = 58360286	   0.55 sec.	KN/sec = 106496
Perft(15)	N = 346184885	   3.22 sec.	KN/sec = 107611

Bert

BertTuyt · Post by **BertTuyt** » Fri Jul 22, 2016 17:33

And the 2nd Perft position used for benchmarking.
Here my result is slower compared with Joost.

Code: Select all

Perft(1)	N = 14	   0.00 sec.	KN/sec = 0
Perft(2)	N = 55	   0.00 sec.	KN/sec = 0
Perft(3)	N = 1168	   0.00 sec.	KN/sec = 0
Perft(4)	N = 5432	   0.00 sec.	KN/sec = 0
Perft(5)	N = 87195	   0.00 sec.	KN/sec = 87195
Perft(6)	N = 629010	   0.00 sec.	KN/sec = 125802
Perft(7)	N = 9041010	   0.07 sec.	KN/sec = 132956
Perft(8)	N = 86724219	   0.48 sec.	KN/sec = 179182
Perft(9)	N = 1216917193	   6.52 sec.	KN/sec = 186615

Bert

Joost Buijs · Post by **Joost Buijs** » Sat Jul 23, 2016 07:40

Bert

Your perft() runs very fast, on what kind of computer CPU/clock frequency you measured this? Which compiler you used?

It was not my goal to make the fastest perft() per se, but to make a move-generator that performs a few times better than the one in my old mailbox program.
The perft() in my old program runs at ~33.5 mnps at the starting position (on the same hardware), that's about 4 times slower, with less pieces on the board the difference seems to get smaller though.

The evaluation-function and probing the hash-table are way more time consuming, a speed difference of a few percent in the move-generator and move-make you won't notice in the total program at all.

Joost

BertTuyt · Post by **BertTuyt** » Sat Jul 23, 2016 11:27

I have a 8-core Intel i7-5960X, but for Perft I only use 1 core.
The processor is water-cooled, is overclocked and runs at 4 GHz.
I use the Microsoft Visual Studio 2015, which nowadays is available as a free download.

Bert

Joost Buijs · Post by **Joost Buijs** » Sat Jul 23, 2016 12:41

BertTuyt wrote: I have a 8-core Intel i7-5960X, but for Perft I only use 1 core.
The processor is water-cooled, is overclocked and runs at 4 GHz.
I use the Microsoft Visual Studio 2015, which nowadays is available as a free download.
Bert

This is exactly the same setup as I have over here, i7-5960X, normally I run it at 3.6 GHz., for tournaments I overclock it to 4.0 or 4.2 GHz.
I use Visual Studio 2015 (update 3). For final builds I use the Intel C++ (v16) compiler which gives a small boost compared to MSVC.
When I run my computer at 4.0 GHz. it will probably add 11% to my nps figures, when I have some time later today I will check my latest perft() at 4.0 GHz. to see what it does.

Joost

Joost Buijs · Post by **Joost Buijs** » Sat Jul 23, 2016 15:40

Bert,

I used the latest version of my move-generator, run it at 4 GHz. and optimized it with PGO.
These are the results I get:

Code: Select all

    m  m  m  m  m
  m  m  m  m  m
    m  m  m  m  m
  m  m  m  m  m
    -  -  -  -  -
  -  -  -  -  -
    M  M  M  M  M
  M  M  M  M  M
    M  M  M  M  M
  M  M  M  M  M

perft( 1)  nodes           9  time  0.0000000  nps         0
perft( 2)  nodes          81  time  0.0000003  nps 237717342
perft( 3)  nodes         658  time  0.0000034  nps 193108656
perft( 4)  nodes        4265  time  0.0000228  nps 186818586
perft( 5)  nodes       27117  time  0.0001523  nps 178036876
perft( 6)  nodes      167140  time  0.0009875  nps 169261375
perft( 7)  nodes     1049442  time  0.0062778  nps 167166929
perft( 8)  nodes     6483961  time  0.0390295  nps 166129855
perft( 9)  nodes    41022423  time  0.2463491  nps 166521483
perft(10)  nodes   258895763  time  1.5084528  nps 171630011
perft(11)  nodes  1665861398  time  9.8068545  nps 169867046

    -  -  -  -  -
  M  -  -  M  M
    M  -  -  -  -
  -  k  -  -  M
    M  M  M  k  -
  -  -  -  -  M
    K  -  M  -  -
  -  M  -  -  -
    M  M  M  M  -
  M  -  -  -  -

perft( 1)  nodes          14  time  0.0000221  nps    632107
perft( 2)  nodes          55  time  0.0000133  nps   4138795
perft( 3)  nodes        1168  time  0.0000187  nps  62324098
perft( 4)  nodes        5432  time  0.0000515  nps 105574409
perft( 5)  nodes       87195  time  0.0003997  nps 218157133
perft( 6)  nodes      629010  time  0.0032401  nps 194132635
perft( 7)  nodes     9041010  time  0.0384673  nps 235031343
perft( 8)  nodes    86724219  time  0.3928527  nps 220755060
perft( 9)  nodes  1216917193  time  5.1363549  nps 236922333

    -  -  -  -  -
  -  -  -  -  -
    -  m  m  m  -
  m  -  m  m  -
    m  -  m  m  M
  m  M  M  -  M
    -  M  M  M  M
  -  M  M  -  -
    -  -  -  -  -
  -  -  -  -  -

perft( 1)  nodes           6  time  0.0000003  nps  17608692
perft( 2)  nodes          12  time  0.0000010  nps  11739128
perft( 3)  nodes          30  time  0.0000010  nps  29347820
perft( 4)  nodes          73  time  0.0000024  nps  30605584
perft( 5)  nodes         215  time  0.0000061  nps  35054341
perft( 6)  nodes         590  time  0.0000157  nps  37641769
perft( 7)  nodes        1944  time  0.0000412  nps  47150547
perft( 8)  nodes        6269  time  0.0001216  nps  51535430
perft( 9)  nodes       22369  time  0.0003830  nps  58405817
perft(10)  nodes       88050  time  0.0012628  nps  69726809
perft(11)  nodes      377436  time  0.0047445  nps  79552742
perft(12)  nodes     1910989  time  0.0198608  nps  96219331
perft(13)  nodes     9872645  time  0.0974842  nps 101274265
perft(14)  nodes    58360286  time  0.5050651  nps 115550024
perft(15)  nodes   346184885  time  2.9987416  nps 115443385

Maybe I can get a few more percent out of it by tweaking but I don't think this is relevant.

Perft(2) seems to run faster than Perft(1), I could not find a bug in my code and now I assume this is due to cache effects.

It seems positions with kings do particularly well, this is probably because I have to scan less due to the magics I use for generating king-moves.

Joost

Joost Buijs · Post by **Joost Buijs** » Sat Jul 23, 2016 17:11

Since the times to optimize with PGO seemed a little bit short to me I added some extra depth, 1 ply for the first two positions and 2 plies for the last position.

Now I get this:

Code: Select all

    m  m  m  m  m
  m  m  m  m  m
    m  m  m  m  m
  m  m  m  m  m
    -  -  -  -  -
  -  -  -  -  -
    M  M  M  M  M
  M  M  M  M  M
    M  M  M  M  M
  M  M  M  M  M

perft( 1)  nodes           9  time  0.0000003  nps  26413002
perft( 2)  nodes          81  time  0.0000003  nps 237717018
perft( 3)  nodes         658  time  0.0000027  nps 241385491
perft( 4)  nodes        4265  time  0.0000211  nps 201884325
perft( 5)  nodes       27117  time  0.0001448  nps 187252647
perft( 6)  nodes      167140  time  0.0009098  nps 183714904
perft( 7)  nodes     1049442  time  0.0058877  nps 178244070
perft( 8)  nodes     6483961  time  0.0363438  nps 178406222
perft( 9)  nodes    41022423  time  0.2282231  nps 179747060
perft(10)  nodes   258895763  time  1.4253068  nps 181642132
perft(11)  nodes  1665861398  time  9.1216300  nps 182627601
perft(12)  nodes 10749771911  time 57.2861119  nps 187650576

    -  -  -  -  -
  M  -  -  M  M
    M  -  -  -  -
  -  k  -  -  M
    M  M  M  k  -
  -  -  -  -  M
    K  -  M  -  -
  -  M  -  -  -
    M  M  M  M  -
  M  -  -  -  -

perft( 1)  nodes          14  time  0.0000215  nps    652173
perft( 2)  nodes          55  time  0.0000181  nps   3045524
perft( 3)  nodes        1168  time  0.0000215  nps  54409852
perft( 4)  nodes        5432  time  0.0000497  nps 109189823
perft( 5)  nodes       87195  time  0.0003997  nps 218156835
perft( 6)  nodes      629010  time  0.0031069  nps 202457196
perft( 7)  nodes     9041010  time  0.0370157  nps 244247671
perft( 8)  nodes    86724219  time  0.3636623  nps 238474619
perft( 9)  nodes  1216917193  time  4.9510586  nps 245789291
perft(10)  nodes 13106503411  time 52.2715701  nps 250738659

    -  -  -  -  -
  -  -  -  -  -
    -  m  m  m  -
  m  -  m  m  -
    m  -  m  m  M
  m  M  M  -  M
    -  M  M  M  M
  -  M  M  -  -
    -  -  -  -  -
  -  -  -  -  -

perft( 1)  nodes           6  time  0.0000003  nps  17608668
perft( 2)  nodes          12  time  0.0000007  nps  17608668
perft( 3)  nodes          30  time  0.0000010  nps  29347780
perft( 4)  nodes          73  time  0.0000027  nps  26779849
perft( 5)  nodes         215  time  0.0000061  nps  35054293
perft( 6)  nodes         590  time  0.0000164  nps  36073313
perft( 7)  nodes        1944  time  0.0000412  nps  47150483
perft( 8)  nodes        6269  time  0.0001135  nps  55249619
perft( 9)  nodes       22369  time  0.0003568  nps  62701097
perft(10)  nodes       88050  time  0.0011824  nps  74468935
perft(11)  nodes      377436  time  0.0043697  nps  86376393
perft(12)  nodes     1910989  time  0.0181785  nps 105123308
perft(13)  nodes     9872645  time  0.0888282  nps 111143159
perft(14)  nodes    58360286  time  0.4710588  nps 123891722
perft(15)  nodes   346184885  time  3.0538893  nps 113358690
perft(16)  nodes  2272406115  time 17.3304478  nps 131122181
perft(17)  nodes 14962263728  time 113.5997428  nps 131710366

Jelle Wiersma · Post by **Jelle Wiersma** » Sun Jul 24, 2016 11:24

Seems quite fast. Are you guys running with or without bulk counting?

BertTuyt · Post by **BertTuyt** » Sun Jul 24, 2016 11:59

Jelle, with bulk counting.

Bert

BertTuyt · Post by **BertTuyt** » Sun Jul 24, 2016 12:04

Joost, which PGO did you use (as I have no experience with this option so far), instrument, optimize or update?

Bert

Joost Buijs · Post by **Joost Buijs** » Sun Jul 24, 2016 12:28

Bert,

I used the PGO from the Intel compiler, I guess the PGO from MSVC is about the same.
First you have to instrument your program for optimization, then you have to let the program run for some time to let it resolve branches etc., after this you can run the optimization pass.

With MSVC it is:

step 1: Instrument
step 2: Run Instrumented/Optimized Application
step 3: Optimize

I have no idea whether the PGO of MSVC is as good as the one from Intel, I never tried.
It also depends upon the program you are optimizing, sometimes PGO does almost nothing and in other cases it makes a difference of 10 to 15%.

Joost

Joost Buijs · Post by **Joost Buijs** » Sun Jul 24, 2016 14:09

Out of curiosity I also tried MSVC PGO, it is not as efficient as the one from Intel but it still does something.
I compared with the standard optimization Maximize speed (/O2), Intrinsics Yes (/Oi), Favor fast code (/Ot) and Omit frame pointers (/Oy).

With MSVC PGO:

position 1 nps +6.5%
position 2 nps +0.0% (approx. equal)
position 3 nps +1.7%

This is only one run, to have more accurate statistics this should be repeated several times.

BertTuyt · Post by **BertTuyt** » Sun Jul 24, 2016 21:40

After some small modifications, but not optimization (

), herewith my most recent results.
Only the most relevant perft depths.

Position 1

Code: Select all

Perft(11)	N = 1665861398	   10.45 sec.	KN/sec = 159366
Perft(12)	N = 10749771911	  66.93 sec.	KN/sec = 160604

Position 2

Code: Select all

Perft(9)	 N = 1216917193	   5.95 sec.	 KN/sec = 204455
Perft(10)	N = 13106503411	  61.13 sec.	KN/sec = 214417

Position 3

Code: Select all

Perft(16)	N = 2272406115	  20.21 sec.	 KN/sec = 112439
Perft(17)	N = 14962263728	 134.13 sec.	KN/sec = 111551

So around 20% slower.
So some work to do.
But as I'm now going for some holidays, no post for some time.

Bert

Joost Buijs · Post by **Joost Buijs** » Mon Jul 25, 2016 07:45

Your results seem to be quite comparable to mine.
Don't forget that I'm using the Intel compiler which has a better optimization compared to MSVC.
Anyway, the differences are so small that they are not relevant for game play at all.

I'm off now to retrograde analysis, which is full of pitfalls when you have never done that before, but it is a nice exercise.

Have a nice Holiday! Talk to you later.

Joost

World Draughts Forum

Perft

Re: Perft

Re: Perft

Re: Perft

Re: Perft

Re: Perft

Re: Perft

Re: Perft

Re: Perft

Re: Perft

Re: Perft

Re: Perft

Re: Perft

Re: Perft

Re: Perft

Re: Perft