I just asked openAI codex to optimise the speed. It thought and experimented for 10 minutes, and it increased speed by 11% for the move generator.
Here is the newest perft for dragon, on an intel i9-13900KF, running at 5.4 Ghz. Nps in milions/second.
Code: Select all
bb bb bb bb bb
bb bb bb bb bb
bb bb bb bb bb
bb bb bb bb bb
.. .. .. .. ..
.. .. .. .. ..
ww ww ww ww ww
ww ww ww ww ww
ww ww ww ww ww
ww ww ww ww ww
20 20 0
1 9 time: 0.00 nps:0.0
2 81 time: 0.00 nps:0.0
3 658 time: 0.00 nps:0.0
4 4.265 time: 0.00 nps:0.0
5 27.117 time: 0.00 nps:0.0
6 167.140 time: 0.00 nps:0.0
7 1.049.442 time: 0.00 nps:349.8
8 6.483.961 time: 0.02 nps:405.2
9 41.022.423 time: 0.10 nps:406.2
10 258.895.763 time: 0.63 nps:412.9
11 1.665.861.398 time: 3.93 nps:423.6
.. .. .. .. ..
ww .. .. ww ww
ww .. .. .. ..
.. BB .. .. ww
ww ww ww BB ..
.. .. .. .. ww
WW .. ww .. ..
.. ww .. .. ..
ww ww ww ww ..
ww .. .. .. ..
17 2 3
1 14 time: 0.00 nps:0.0
2 55 time: 0.00 nps:0.0
3 1.168 time: 0.00 nps:0.0
4 5.432 time: 0.00 nps:0.0
5 87.195 time: 0.00 nps:0.0
6 629.010 time: 0.00 nps:314.5
7 9.041.010 time: 0.03 nps:274.0
8 86.724.219 time: 0.25 nps:344.1
9 1.216.917.193 time: 4.26 nps:285.3
.. .. .. .. ..
.. .. .. .. ..
.. bb bb bb ..
bb .. bb bb ..
bb .. bb bb ww
bb ww ww .. ww
.. ww ww ww ww
.. ww ww .. ..
.. .. .. .. ..
.. .. .. .. ..
10 10 0
1 6 time: 0.00 nps:0.0
2 12 time: 0.00 nps:0.0
3 30 time: 0.00 nps:0.0
4 73 time: 0.00 nps:0.0
5 215 time: 0.00 nps:0.0
6 590 time: 0.00 nps:0.0
7 1.944 time: 0.00 nps:0.0
8 6.269 time: 0.00 nps:0.0
9 22.369 time: 0.00 nps:0.0
10 88.050 time: 0.00 nps:88.0
11 377.436 time: 0.00 nps:125.8
12 1.910.989 time: 0.01 nps:173.7
13 9.872.645 time: 0.06 nps:173.2
14 58.360.286 time: 0.31 nps:189.5
15 346.184.885 time: 1.91 nps:181.4
Total time: 11.52
Here is the prompt:
can you start the engine, run the pt command, and store the result to c:\temp\originalPT.txt?
/goal Optimize the move generator
Try to improve that to as low as possible. Only modify the movegenfase.c file. Check the results against the original file. If there is any change in the counts, your modification returned an invalid movelist, and you need to revert back to the previous version.
Keep in a loop trying until you can find no more improvements. At each iteration, report the benchmark time
codex result:
Replaced every bitScan64(x) + 1LL << p sequence with direct low-bit extraction. This removes unnecessary scan/shift work throughout both normal and easy move-generation paths.
pt results:
Original: 12.93 s
Final validation: 11.45 s
Repeats: 11.53 s, 11.44 s, 11.41 s
Improvement: about 11.5%
All three pt position/depth count sequences match the original exactly. Release x64 build succeeds with no warnings. I also tested loop-control simplification and __restrict; neither improved timing, so both were reverted.