Search Algorithm
Re: Search Algorithm
I also tried to further improve the FR result, without History, 4-tier buckets.
I added a HashWrite() only on the first entry-depth of the Q-Search (depth 0) in case of a cutoff (4m0).
Normally I don't apply the HashRead()/HashWrite() in the Q-phase.
As I only continue capture moves, basically you could also store this as a Depth 1 Position (not 100% sure if the reasoning is valid ).
This is depicted in the graph as 4m1.
Interesting to see that a further reduction of 7%-8% was possible.
Bert
I added a HashWrite() only on the first entry-depth of the Q-Search (depth 0) in case of a cutoff (4m0).
Normally I don't apply the HashRead()/HashWrite() in the Q-phase.
As I only continue capture moves, basically you could also store this as a Depth 1 Position (not 100% sure if the reasoning is valid ).
This is depicted in the graph as 4m1.
Interesting to see that a further reduction of 7%-8% was possible.
Bert
- Attachments
-
- Woldouby FR 4m.png (111.57 KiB) Viewed 10127 times
-
- Posts: 471
- Joined: Wed May 04, 2016 11:45
- Real name: Joost Buijs
Re: Search Algorithm
Bert,BertTuyt wrote:Joost, is this with your new 10-core processor, or the 8-core version.
For testing the new instruction-set options I will buy a new Intel Kaby-Lake based system in Q1 2017.
Although only 4 core, it already operates at 4.2 GHZ, so with water cooling I think 4.5 GHZ or even 5 GHZ would be possible.
Bert
These results are from the new 10 core PC I build 2 weeks ago with very relaxed settings: 3600 MHz., 1 GB hash, MS-compiler, no PGO and no Large Pages.
At 4.0 GHz. (which it does easily and with less heat than the i7-5960x) using the Intel-compiler with PGO and Large Pages the program tops 170 mnps. at the initial position, it drops down to 100 mnps at the Woldouby because the branching factor gets so low that it gets difficult to keep the processors busy. I'm thinking about another SMP scheme which has less difficulties when there are just a few moves available at a node, but this has low priority.
Basically I built this machine to use as my main PC and to have the older 8 core as a backup, in the past I had two 6 core machines as backup which as you know I sold half a year ago, the 8 core comes in very handy now for ML and building an opening book.
Joost
-
- Posts: 471
- Joined: Wed May 04, 2016 11:45
- Real name: Joost Buijs
Re: Search Algorithm
Bert,BertTuyt wrote:I also tried to further improve the FR result, without History, 4-tier buckets.
I added a HashWrite() only on the first entry-depth of the Q-Search (depth 0) in case of a cutoff (4m0).
Normally I don't apply the HashRead()/HashWrite() in the Q-phase.
As I only continue capture moves, basically you could also store this as a Depth 1 Position (not 100% sure if the reasoning is valid ).
This is depicted in the graph as 4m1.
Interesting to see that a further reduction of 7%-8% was possible.
Bert
Very nice graphs, it is interesting to see what the influence of the different move-orders, the number of tiers in the bucket and the hash-size is.
It is clear to see that when the hash-table gets large enough the number of tiers does not play a role anymore, the only drawback of a very large table is that the accesses are getting very slow because the TLB of the processor gets overstressed, you can circumvent this mainly by using Large-Pages which are unfortunately a bit cumbersome to use.
Storing the scores and moves of the q-search() in the hash-table is common practice, in my chess engine I store them with depth 0, since storing and probing the hash is very slow you have to find an optimum between the decrease in speed and the decrease in the number of nodes, another problem is that q-search() generates a lot of nodes and that the hash-table gets overloaded very quickly.
Intel now officially announced the i7-7700K, it is not very expensive and runs at 4.5 GHz with turbo, I guess you can easily overclock it to 5 GHz., maybe even higher, and with the increased IPC it will be a beast of a processor, I estimate the speed (with SMP) to be ~70% of your i7-5960x, single core it will be a lot faster than the 5960x of course.
I'm still working on a totally different and boring project which I hope to finish within 10 days or so, sometimes I can't resist toying with my draughts program because that is a lot more interesting.
Joost
Re: Search Algorithm
Joost, if my memory is right the i7-6950x is still based upon the Broadwell architecture although its name implies a 6th generation design.
The Skylake and Kabylake extreme editions will be launched at a later date ( >= 2017).
A pity Intel is without competition in the high-end segment, as the normal desktop chips are 2 generations further, compared with the extreme.
Also the price tag for the ultra high-end ( i7-6950x) is insane.
The good news there is room for improvement, and maybe AMD will be more successful with the Zen architecture.
So in some years we all can run a 10-core at 4 GHZ+, for a reasonable price.
If I'm right, the i7-6950X can address 128 GByte Memory, where the i7-6950x tops at 64 GByte.
So with 8 x 16 GByte DIMMS you can create a beast of a machine.
As you stated, due to the lower branching factor, the SMP need rework , it doesn't come for free.
I was wondering if with ML based evaluations, a 8P DB, a computer generated opening book (example DOE based), and 100M+ nodes/second, we are not entering a draw-always area (at least for the programs who embark on this voyage).
Finally, I assume that at least this is beyond world-champion level
Bert
The Skylake and Kabylake extreme editions will be launched at a later date ( >= 2017).
A pity Intel is without competition in the high-end segment, as the normal desktop chips are 2 generations further, compared with the extreme.
Also the price tag for the ultra high-end ( i7-6950x) is insane.
The good news there is room for improvement, and maybe AMD will be more successful with the Zen architecture.
So in some years we all can run a 10-core at 4 GHZ+, for a reasonable price.
If I'm right, the i7-6950X can address 128 GByte Memory, where the i7-6950x tops at 64 GByte.
So with 8 x 16 GByte DIMMS you can create a beast of a machine.
As you stated, due to the lower branching factor, the SMP need rework , it doesn't come for free.
I was wondering if with ML based evaluations, a 8P DB, a computer generated opening book (example DOE based), and 100M+ nodes/second, we are not entering a draw-always area (at least for the programs who embark on this voyage).
Finally, I assume that at least this is beyond world-champion level
Bert
-
- Posts: 471
- Joined: Wed May 04, 2016 11:45
- Real name: Joost Buijs
Re: Search Algorithm
Bert,BertTuyt wrote:Joost, if my memory is right the i7-6950x is still based upon the Broadwell architecture although its name implies a 6th generation design.
The Skylake and Kabylake extreme editions will be launched at a later date ( >= 2017).
A pity Intel is without competition in the high-end segment, as the normal desktop chips are 2 generations further, compared with the extreme.
Also the price tag for the ultra high-end ( i7-6950x) is insane.
The good news there is room for improvement, and maybe AMD will be more successful with the Zen architecture.
So in some years we all can run a 10-core at 4 GHZ+, for a reasonable price.
If I'm right, the i7-6950X can address 128 GByte Memory, where the i7-6950x tops at 64 GByte.
So with 8 x 16 GByte DIMMS you can create a beast of a machine.
As you stated, due to the lower branching factor, the SMP need rework , it doesn't come for free.
I was wondering if with ML based evaluations, a 8P DB, a computer generated opening book (example DOE based), and 100M+ nodes/second, we are not entering a draw-always area (at least for the programs who embark on this voyage).
Finally, I assume that at least this is beyond world-champion level
Bert
The difference between Skylake/Kabylake and Broadwell is not very large, maybe when we get systems with the LGA-3647 socket it will have some impact.
64 GB. memory is enough for what I want to do with the machine, if somebody needs more memory they have to go for the Xeon counterparts.
The i7-6950x is way too expensive, I got a good deal on the processor from a friend who bought it and decided later not use it (probably because he could not afford it), it was a boxed version (still originally sealed) otherwise I wouldn't have taken it over from him.
I have no clue about Draughts drawing-rate or at what level the programs currently are, what strikes me is that in human vs computer tournaments the computers hardly surpass strong club-players, let alone grand-masters, maybe there is still a lot to gain with Draughts.
I'm afraid that the AMD Zen is not going to be what everybody expects at the moment, more and more I get the impression that it will compare favorably against an Intel core i5 but that it is no match for the Intel extreme processors despite the benchmarks they showed lately.
Updating my SMP has not much priority, in the mean time I improved it somewhat compared to the version I gave you, I agree that it is less effective during endgames when the branching-factor is low but I still see a speedup of minimal a factor 4 to 5 time to depth.
Joost
Re: Search Algorithm
Joost, as I didn't want to go into a History sort mode (yet), I now tried (the well known) Internal Iterative Deepening (IID).
I started with a very basic implementation, see code below.
For iDepth >= 8 a 5 Ply search is executed, to find the best move.
This Q&D (quick and dirty) implementation yielded a further 3%-4% node reduction (didn't measure the time to depth).
Bert
I started with a very basic implementation, see code below.
For iDepth >= 8 a 5 Ply search is executed, to find the best move.
Code: Select all
if (iDepth >= 8 && iHashMove == HASHTABLE_ERROR_KEY) {
int iIIDMove = IIDSearch(pTDepth, iAlpha, iBeta, iPly, 5, bTurn);
SortMove(false, pTDepth->FirstMove, iIIDMove, iXMoves); // Sort Moves
}
Code: Select all
4 4m0 4m1 4m1I
20 109105897 106456697 106030233 101123008
21 102060458 99202863 98552204 92419303
22 96490110 93629084 92361546 88043619
23 93462897 90585693 88356786 85011233
24 92225078 89223828 86216454 83082037
25 92006065 88917103 85424990 82411321
26 91974401 88880466 85258540 82275469
27 91971555 88876921 85244183 82263852
28 91971350 88876658 85243020 82262935
29 91971348 88876654 85242939 82262866
30 91971348 88876654 85242929 82262855
- Attachments
-
- Woldouby FR 4m1I.png (143.74 KiB) Viewed 10028 times
-
- Posts: 471
- Joined: Wed May 04, 2016 11:45
- Real name: Joost Buijs
Re: Search Algorithm
Bert,BertTuyt wrote:Joost, as I didn't want to go into a History sort mode (yet), I now tried (the well known) Internal Iterative Deepening (IID).
I started with a very basic implementation, see code below.
For iDepth >= 8 a 5 Ply search is executed, to find the best move.
This Q&D (quick and dirty) implementation yielded a further 3%-4% node reduction (didn't measure the time to depth).Code: Select all
if (iDepth >= 8 && iHashMove == HASHTABLE_ERROR_KEY) { int iIIDMove = IIDSearch(pTDepth, iAlpha, iBeta, iPly, 5, bTurn); SortMove(false, pTDepth->FirstMove, iIIDMove, iXMoves); // Sort Moves }
BertCode: Select all
4 4m0 4m1 4m1I 20 109105897 106456697 106030233 101123008 21 102060458 99202863 98552204 92419303 22 96490110 93629084 92361546 88043619 23 93462897 90585693 88356786 85011233 24 92225078 89223828 86216454 83082037 25 92006065 88917103 85424990 82411321 26 91974401 88880466 85258540 82275469 27 91971555 88876921 85244183 82263852 28 91971350 88876658 85243020 82262935 29 91971348 88876654 85242939 82262866 30 91971348 88876654 85242929 82262855
Yes, the more sorting mechanisms you add the smaller the tree will get, IID is very easy to implement and I still have to add that as well.
History still doesn't work very well in my engine, the improvement it gives is cancelled out by the overhead it gives, my guess is that I first got to have a mature evaluation function before I can start optimizing the history implementation.
ATM I use the hash-move, 2 ordinary killers and a response killer table, the hash-move is always sorted at first spot and the other three thereafter in random order.
Everything is a tradeoff between speed and accuracy, since I want to have a very fast engine the sorting process is probably not optimal.
Next thing I want to do is my evaluation function, there is nothing available to start with, I can't find games between strong players anywhere so I have to generate the games and corresponding positions myself, update the evaluation parameters, and repeat the whole process till the engine reaches a respectable level and the parameters don't change anymore.
I still have to decide what kind of evaluation patterns I'm going to use, at first I will use the same patterns as Scan since it is proven that they work, later I want to change to larger patterns that are more in line with the patterns I find in draughts theory books.
Still enough to do, a few months ago I planned to have my engine ready by now but as usual things take a lot more time than expected and now I hope to have it ready by spring next year.
Joost
Re: Search Algorithm
Joost, herewith a test to optimize IID for the Woldouby Position (FR, 28 Ply).
The optimum seems to be reached for iDepth >= 5, and a IID Search Depth of 3.
Approximately a 11% reduction in node counts.
See results ( 2 - 20 Search Depth Threshold, 2 - 6 IID Search Depth) and graph.
Table numbers compared with no IID.
Bert
The optimum seems to be reached for iDepth >= 5, and a IID Search Depth of 3.
Approximately a 11% reduction in node counts.
Code: Select all
if (iDepth >= 5 && iHashMove == HASHTABLE_ERROR_KEY) {
int iIIDMove = IIDSearch(pTDepth, iAlpha, iBeta, iPly, 3, bTurn);
SortMove(false, pTDepth->FirstMove, iIIDMove, iXMoves); // Sort Moves
}
Table numbers compared with no IID.
Code: Select all
2 3 4 5 6
2 0.99510533
3 0.959000212 0.912540206
4 0.929009267 0.904204653 0.915486984
5 0.921847617 0.887470432 0.913133945 0.956657038
6 0.945975472 0.939789716 0.943535727 0.955735202 0.968939735
7 0.959844001 0.951658113 0.952946162 0.956180846 0.968479544
8 0.966621431 0.964940799 0.962597695 0.965040221 0.961893731
9 0.973180814 0.970688278 0.966283022 0.970468202 0.963407546
10 0.989488923 0.979160359 0.977817327 0.983031801 0.974397102
11 0.990892218 0.988735758 0.989312392 0.985961874 0.980054533
12 0.993661914 0.993245997 0.995395043 0.992585191 0.99245277
13 0.999774187 0.999354597 0.999142005 0.996348002 0.9961369
14 1.000004282 1.000011215 1.000017761
15 1.000002722 1.000008622 1.000011321
16 1 1
17 1 1
18 1 1
19 1 1
20 1 1
- Attachments
-
- Woldouby IID.png (141.38 KiB) Viewed 10006 times
-
- Posts: 471
- Joined: Wed May 04, 2016 11:45
- Real name: Joost Buijs
Re: Search Algorithm
Bert,BertTuyt wrote:Joost, herewith a test to optimize IID for the Woldouby Position (FR, 28 Ply).
The optimum seems to be reached for iDepth >= 5, and a IID Search Depth of 3.
Approximately a 11% reduction in node counts.See results ( 2 - 20 Search Depth Threshold, 2 - 6 IID Search Depth) and graph.Code: Select all
if (iDepth >= 5 && iHashMove == HASHTABLE_ERROR_KEY) { int iIIDMove = IIDSearch(pTDepth, iAlpha, iBeta, iPly, 3, bTurn); SortMove(false, pTDepth->FirstMove, iIIDMove, iXMoves); // Sort Moves }
Table numbers compared with no IID.
BertCode: Select all
2 3 4 5 6 2 0.99510533 3 0.959000212 0.912540206 4 0.929009267 0.904204653 0.915486984 5 0.921847617 0.887470432 0.913133945 0.956657038 6 0.945975472 0.939789716 0.943535727 0.955735202 0.968939735 7 0.959844001 0.951658113 0.952946162 0.956180846 0.968479544 8 0.966621431 0.964940799 0.962597695 0.965040221 0.961893731 9 0.973180814 0.970688278 0.966283022 0.970468202 0.963407546 10 0.989488923 0.979160359 0.977817327 0.983031801 0.974397102 11 0.990892218 0.988735758 0.989312392 0.985961874 0.980054533 12 0.993661914 0.993245997 0.995395043 0.992585191 0.99245277 13 0.999774187 0.999354597 0.999142005 0.996348002 0.9961369 14 1.000004282 1.000011215 1.000017761 15 1.000002722 1.000008622 1.000011321 16 1 1 17 1 1 18 1 1 19 1 1 20 1 1
Thanks for sharing this, it gives me an idea where to start with the parameters for IID.
There is also the question whether to use IID at non-pv nodes or not, at ALL nodes for instance it doesn't seem very useful to me.
ATM I don't have a separate PV and ZW-search but I'm thinking about splitting them up in two separate functions or using a template to circumvent unnecessary conditionals that will slow things down.
Joost
-
- Posts: 471
- Joined: Wed May 04, 2016 11:45
- Real name: Joost Buijs
Re: Search Algorithm
Very quiet here lately.
After three weeks of doing other boring stuff I want to continue with my search.
Things I still want to add: draw-recognition, futility-pruning, probcut and I'm looking into singular-extensions but I have no clue if this will help with draughts because in contrary to chess captures are forced.
BTW. does anybody know if there is a computer tournament this year in Culemborg? If so I would possibly like to take a look over there.
After three weeks of doing other boring stuff I want to continue with my search.
Things I still want to add: draw-recognition, futility-pruning, probcut and I'm looking into singular-extensions but I have no clue if this will help with draughts because in contrary to chess captures are forced.
BTW. does anybody know if there is a computer tournament this year in Culemborg? If so I would possibly like to take a look over there.
Re: Search Algorithm
Joost, guess no Culemborg tournament this year (also not last year).
But the master of ceremony Leo might have a clue.
Bert
But the master of ceremony Leo might have a clue.
Bert
-
- Posts: 471
- Joined: Wed May 04, 2016 11:45
- Real name: Joost Buijs
Re: Search Algorithm
Bert,BertTuyt wrote:Joost, guess no Culemborg tournament this year (also not last year).
But the master of ceremony Leo might have a clue.
Thanks for the info!
I didn't know that there was no tourney last year.
It's been a long time since I last spoke with Leo, probably he doesn't even remember me, but I will send him a mail.
It is a pity that there is almost no activity with regard to computer draughts, this is not very inspiring to say the least. Anyway I will still try to get my program in such a state that it is able to play games via DXP.
Maybe we need something like an internet draughts server (like what ICC, FICS and HGM for chess are doing) to bring back some life to this community.
Joost
Re: Search Algorithm
Joost, yes there is little activity here.
Or all are working on secret improvements, or all have run out of ideas.
I have implemented (as is) the 8P DB-driver from Ed, and did some experiments with the Woldouby position.
First of all its close to a miracle to see a zero after many plies.
I also tested if different decrement depth methods would yield a difference in number of nodes, before the program finds a draw.
In below table I tested 3 methods.
1) Decrement depth always, 2) Decrement if number of moves > 1, 3) Decrement in case no capture move.
Next to that I included and excluded Internal Iterative Deepening in the case no hashMove was available.
Further no sorting, or killer, or History.
In addition I added in the evaluation function if ( iXWhiteKing > 0 && iXBlackKing > 0) ) iEvaluation /= 2;
So far I only included DB-scan in the PVSearch and not the QSearch, which I also need to do, and then run tests again.
I also started with ProbCut, but this needs fine tuning (or just switch-off) in the endgame phase (think it works better in the opening and middle).
Bert
Or all are working on secret improvements, or all have run out of ideas.
I have implemented (as is) the 8P DB-driver from Ed, and did some experiments with the Woldouby position.
First of all its close to a miracle to see a zero after many plies.
I also tested if different decrement depth methods would yield a difference in number of nodes, before the program finds a draw.
In below table I tested 3 methods.
1) Decrement depth always, 2) Decrement if number of moves > 1, 3) Decrement in case no capture move.
Next to that I included and excluded Internal Iterative Deepening in the case no hashMove was available.
Further no sorting, or killer, or History.
Code: Select all
Method IID Nodes Ply
1) Always Off 253.347.095 30
1) Always On 206.719.204 30
2) Moves > 1 Off 210.825.319 22
2) Moves > 1 On 139.853.005 22
3) No Capture Off 248.610.432 20
3) No Capture On 138.422.730 20
So far I only included DB-scan in the PVSearch and not the QSearch, which I also need to do, and then run tests again.
I also started with ProbCut, but this needs fine tuning (or just switch-off) in the endgame phase (think it works better in the opening and middle).
Bert
-
- Posts: 1722
- Joined: Wed Apr 14, 2004 16:04
- Contact:
Re: Search Algorithm
Bert, as you know, there is nothing secret to what I am doing https://github.com/rhalbersma/dctlBertTuyt wrote:Joost, yes there is little activity here.
Or all are working on secret improvements, or all have run out of ideas.
Also, earlier this month, I contributed some C++11 locking mechanisms to Ed's edgb driver https://github.com/eygilbert/egdb_intl/ ... ional_test
If there's anything you'd like to know, please ask!
I really wished you and Joost and others would just put your programs, or your GUI or anything else that is on Dropbox on GitHub. That makes it a lot easier to exchange ideas, small pieces of code etc.
Re: Search Algorithm
Rein, thanks for sharing.
I really need to learn the ins and outs of Github and alike.
Bert
I really need to learn the ins and outs of Github and alike.
Bert