Bert,
When I remove the evaluation function from my search() it runs at ~32 mnps, this is with the MSVC compiler at 3600 MHz. With the evaluation function it runs ~20 mnps. I used the Woldouby position to measure this at depth 20.
I'm currently working on my evaluation function to determine what is the best way to calculate the indices for the structural evaluation.
In my Othello program I did in the past: I=a; I=I*3+b; I=I*3+c; etc. however this is very slow, when I use this for all the features my program slows down to ~7mnps.
So I use pext() which is very fast, but the problem with it is that it is difficult to make an index function without holes in it, so the tables will get larger which also slows things down. I just have to find the best tradeoff between the speed of index calculation and table-size.
Another thing it that x86 CPU's don't have an instruction to reverse bit order, this is very pity because I need this at several places, at the moment it seems that using a small table to do this works best.
Joost
Edit:
I streamlined the evaluation() a bit further and now the search() runs at ~24 mnps.
At the moment I call the evaluation() when the side to move has no captures, probably I have to do this when both sides have no captures and the position is completely quiet.
The evaluation function is very basic, only material + patterns, I timed it with __rdtsc() and on average it seems to take ~36 clock-cycles with some very strange shootouts I can't explain. There might be a measurement error because __rdtsc() is not a serializing instruction, something I have to examine further.
Code: Select all
eval-no. 85538 proc-cycles 18
eval-no. 85539 proc-cycles 234
eval-no. 85540 proc-cycles 81
eval-no. 85541 proc-cycles 87
eval-no. 85542 proc-cycles 84
eval-no. 85543 proc-cycles 21
eval-no. 85544 proc-cycles 18
eval-no. 85545 proc-cycles 21
eval-no. 85546 proc-cycles 87
eval-no. 85547 proc-cycles 84
eval-no. 85548 proc-cycles 78
eval-no. 85549 proc-cycles 87
eval-no. 85550 proc-cycles 72
eval-no. 85551 proc-cycles 24
eval-no. 85552 proc-cycles 21
eval-no. 85553 proc-cycles 78
eval-no. 85554 proc-cycles 78
eval-no. 85555 proc-cycles 81
eval-no. 85556 proc-cycles 87
eval-no. 85557 proc-cycles 84
eval-no. 85558 proc-cycles 81
eval-no. 85559 proc-cycles 63
eval-no. 85560 proc-cycles 84
eval-no. 85561 proc-cycles 84
eval-no. 85562 proc-cycles 21
eval-no. 85563 proc-cycles 276
eval-no. 85564 proc-cycles 18
eval-no. 85565 proc-cycles 21
eval-no. 85566 proc-cycles 27
Probably a large part of it runs from the cache, otherwise I can't explain why it runs so fast.
With a multi-processor search() the different processors will compete for the cache and memory bandwidth, I assume in that case the evaluation function will slow down considerably.