Search Algorithm

BertTuyt · Post by **BertTuyt** » Tue Nov 01, 2016 00:24

I also tried to further improve the FR result, without History, 4-tier buckets.
I added a HashWrite() only on the first entry-depth of the Q-Search (depth 0) in case of a cutoff (4m0).
Normally I don't apply the HashRead()/HashWrite() in the Q-phase.

As I only continue capture moves, basically you could also store this as a Depth 1 Position (not 100% sure if the reasoning is valid

).
This is depicted in the graph as 4m1.

Interesting to see that a further reduction of 7%-8% was possible.

Bert

Joost Buijs · Post by **Joost Buijs** » Tue Nov 01, 2016 07:40

BertTuyt wrote:Joost, is this with your new 10-core processor, or the 8-core version.
For testing the new instruction-set options I will buy a new Intel Kaby-Lake based system in Q1 2017.
Although only 4 core, it already operates at 4.2 GHZ, so with water cooling I think 4.5 GHZ or even 5 GHZ would be possible.

Bert

Bert,

These results are from the new 10 core PC I build 2 weeks ago with very relaxed settings: 3600 MHz., 1 GB hash, MS-compiler, no PGO and no Large Pages.
At 4.0 GHz. (which it does easily and with less heat than the i7-5960x) using the Intel-compiler with PGO and Large Pages the program tops 170 mnps. at the initial position, it drops down to 100 mnps at the Woldouby because the branching factor gets so low that it gets difficult to keep the processors busy. I'm thinking about another SMP scheme which has less difficulties when there are just a few moves available at a node, but this has low priority.

Basically I built this machine to use as my main PC and to have the older 8 core as a backup, in the past I had two 6 core machines as backup which as you know I sold half a year ago, the 8 core comes in very handy now for ML and building an opening book.

Joost

Joost Buijs · Post by **Joost Buijs** » Tue Nov 01, 2016 08:09

BertTuyt wrote:I also tried to further improve the FR result, without History, 4-tier buckets.
I added a HashWrite() only on the first entry-depth of the Q-Search (depth 0) in case of a cutoff (4m0).
Normally I don't apply the HashRead()/HashWrite() in the Q-phase.

As I only continue capture moves, basically you could also store this as a Depth 1 Position (not 100% sure if the reasoning is valid ).
This is depicted in the graph as 4m1.

Interesting to see that a further reduction of 7%-8% was possible.

Bert

Bert,

Very nice graphs, it is interesting to see what the influence of the different move-orders, the number of tiers in the bucket and the hash-size is.
It is clear to see that when the hash-table gets large enough the number of tiers does not play a role anymore, the only drawback of a very large table is that the accesses are getting very slow because the TLB of the processor gets overstressed, you can circumvent this mainly by using Large-Pages which are unfortunately a bit cumbersome to use.

Storing the scores and moves of the q-search() in the hash-table is common practice, in my chess engine I store them with depth 0, since storing and probing the hash is very slow you have to find an optimum between the decrease in speed and the decrease in the number of nodes, another problem is that q-search() generates a lot of nodes and that the hash-table gets overloaded very quickly.

Intel now officially announced the i7-7700K, it is not very expensive and runs at 4.5 GHz with turbo, I guess you can easily overclock it to 5 GHz., maybe even higher, and with the increased IPC it will be a beast of a processor, I estimate the speed (with SMP) to be ~70% of your i7-5960x, single core it will be a lot faster than the 5960x of course.

I'm still working on a totally different and boring project which I hope to finish within 10 days or so, sometimes I can't resist toying with my draughts program because that is a lot more interesting.

Joost

BertTuyt · Post by **BertTuyt** » Tue Nov 01, 2016 12:10

Joost, if my memory is right the i7-6950x is still based upon the Broadwell architecture although its name implies a 6th generation design.
The Skylake and Kabylake extreme editions will be launched at a later date ( >= 2017).

A pity Intel is without competition in the high-end segment, as the normal desktop chips are 2 generations further, compared with the extreme.
Also the price tag for the ultra high-end ( i7-6950x) is insane.

The good news there is room for improvement, and maybe AMD will be more successful with the Zen architecture.

So in some years we all can run a 10-core at 4 GHZ+, for a reasonable price.
If I'm right, the i7-6950X can address 128 GByte Memory, where the i7-6950x tops at 64 GByte.
So with 8 x 16 GByte DIMMS you can create a beast of a machine.

As you stated, due to the lower branching factor, the SMP need rework

, it doesn't come for free.

I was wondering if with ML based evaluations, a 8P DB, a computer generated opening book (example DOE based), and 100M+ nodes/second, we are not entering a draw-always area (at least for the programs who embark on this voyage).
Finally, I assume that at least this is beyond world-champion level

Bert

Joost Buijs · Post by **Joost Buijs** » Tue Nov 01, 2016 12:58

BertTuyt wrote:Joost, if my memory is right the i7-6950x is still based upon the Broadwell architecture although its name implies a 6th generation design.
The Skylake and Kabylake extreme editions will be launched at a later date ( >= 2017).

A pity Intel is without competition in the high-end segment, as the normal desktop chips are 2 generations further, compared with the extreme.
Also the price tag for the ultra high-end ( i7-6950x) is insane.

The good news there is room for improvement, and maybe AMD will be more successful with the Zen architecture.

So in some years we all can run a 10-core at 4 GHZ+, for a reasonable price.
If I'm right, the i7-6950X can address 128 GByte Memory, where the i7-6950x tops at 64 GByte.
So with 8 x 16 GByte DIMMS you can create a beast of a machine.

As you stated, due to the lower branching factor, the SMP need rework , it doesn't come for free.

I was wondering if with ML based evaluations, a 8P DB, a computer generated opening book (example DOE based), and 100M+ nodes/second, we are not entering a draw-always area (at least for the programs who embark on this voyage).
Finally, I assume that at least this is beyond world-champion level

Bert

Bert,

The difference between Skylake/Kabylake and Broadwell is not very large, maybe when we get systems with the LGA-3647 socket it will have some impact.
64 GB. memory is enough for what I want to do with the machine, if somebody needs more memory they have to go for the Xeon counterparts.

The i7-6950x is way too expensive, I got a good deal on the processor from a friend who bought it and decided later not use it (probably because he could not afford it), it was a boxed version (still originally sealed) otherwise I wouldn't have taken it over from him.

I have no clue about Draughts drawing-rate or at what level the programs currently are, what strikes me is that in human vs computer tournaments the computers hardly surpass strong club-players, let alone grand-masters, maybe there is still a lot to gain with Draughts.

I'm afraid that the AMD Zen is not going to be what everybody expects at the moment, more and more I get the impression that it will compare favorably against an Intel core i5 but that it is no match for the Intel extreme processors despite the benchmarks they showed lately.

Updating my SMP has not much priority, in the mean time I improved it somewhat compared to the version I gave you, I agree that it is less effective during endgames when the branching-factor is low but I still see a speedup of minimal a factor 4 to 5 time to depth.

Joost

BertTuyt · Post by **BertTuyt** » Tue Nov 01, 2016 14:20

Joost, as I didn't want to go into a History sort mode (yet), I now tried (the well known) Internal Iterative Deepening (IID).
I started with a very basic implementation, see code below.
For iDepth >= 8 a 5 Ply search is executed, to find the best move.

Code: Select all

	if (iDepth >= 8 && iHashMove == HASHTABLE_ERROR_KEY) {

		int iIIDMove = IIDSearch(pTDepth, iAlpha, iBeta, iPly, 5, bTurn);
			SortMove(false, pTDepth->FirstMove, iIIDMove, iXMoves);	// Sort Moves
	}

This Q&D (quick and dirty) implementation yielded a further 3%-4% node reduction (didn't measure the time to depth).

Code: Select all

			4			4m0			4m1			4m1I
20	109105897	106456697	106030233	101123008
21	102060458	99202863	98552204	92419303
22	96490110	93629084	92361546	88043619
23	93462897	90585693	88356786	85011233
24	92225078	89223828	86216454	83082037
25	92006065	88917103	85424990	82411321
26	91974401	88880466	85258540	82275469
27	91971555	88876921	85244183	82263852
28	91971350	88876658	85243020	82262935
29	91971348	88876654	85242939	82262866
30	91971348	88876654	85242929	82262855

Bert

Joost Buijs · Post by **Joost Buijs** » Tue Nov 01, 2016 15:51

BertTuyt wrote:Joost, as I didn't want to go into a History sort mode (yet), I now tried (the well known) Internal Iterative Deepening (IID).
I started with a very basic implementation, see code below.
For iDepth >= 8 a 5 Ply search is executed, to find the best move.
Code: Select all
	if (iDepth >= 8 && iHashMove == HASHTABLE_ERROR_KEY) {

		int iIIDMove = IIDSearch(pTDepth, iAlpha, iBeta, iPly, 5, bTurn);
			SortMove(false, pTDepth->FirstMove, iIIDMove, iXMoves);	// Sort Moves
	}
This Q&D (quick and dirty) implementation yielded a further 3%-4% node reduction (didn't measure the time to depth).
Code: Select all
			4			4m0			4m1			4m1I
20	109105897	106456697	106030233	101123008
21	102060458	99202863	98552204	92419303
22	96490110	93629084	92361546	88043619
23	93462897	90585693	88356786	85011233
24	92225078	89223828	86216454	83082037
25	92006065	88917103	85424990	82411321
26	91974401	88880466	85258540	82275469
27	91971555	88876921	85244183	82263852
28	91971350	88876658	85243020	82262935
29	91971348	88876654	85242939	82262866
30	91971348	88876654	85242929	82262855
Bert

Bert,

Yes, the more sorting mechanisms you add the smaller the tree will get, IID is very easy to implement and I still have to add that as well.
History still doesn't work very well in my engine, the improvement it gives is cancelled out by the overhead it gives, my guess is that I first got to have a mature evaluation function before I can start optimizing the history implementation.

ATM I use the hash-move, 2 ordinary killers and a response killer table, the hash-move is always sorted at first spot and the other three thereafter in random order.
Everything is a tradeoff between speed and accuracy, since I want to have a very fast engine the sorting process is probably not optimal.

Next thing I want to do is my evaluation function, there is nothing available to start with, I can't find games between strong players anywhere so I have to generate the games and corresponding positions myself, update the evaluation parameters, and repeat the whole process till the engine reaches a respectable level and the parameters don't change anymore.
I still have to decide what kind of evaluation patterns I'm going to use, at first I will use the same patterns as Scan since it is proven that they work, later I want to change to larger patterns that are more in line with the patterns I find in draughts theory books.

Still enough to do, a few months ago I planned to have my engine ready by now but as usual things take a lot more time than expected and now I hope to have it ready by spring next year.

Joost

BertTuyt · Post by **BertTuyt** » Tue Nov 01, 2016 16:29

Joost, herewith a test to optimize IID for the Woldouby Position (FR, 28 Ply).

The optimum seems to be reached for iDepth >= 5, and a IID Search Depth of 3.
Approximately a 11% reduction in node counts.

Code: Select all

	if (iDepth >= 5 && iHashMove == HASHTABLE_ERROR_KEY) {

		int iIIDMove = IIDSearch(pTDepth, iAlpha, iBeta, iPly, 3, bTurn);
			SortMove(false, pTDepth->FirstMove, iIIDMove, iXMoves);	// Sort Moves
	}

See results ( 2 - 20 Search Depth Threshold, 2 - 6 IID Search Depth) and graph.
Table numbers compared with no IID.

Code: Select all

			2				3				4				5				6
2	0.99510533				
3	0.959000212	0.912540206			
4	0.929009267	0.904204653	0.915486984		
5	0.921847617	0.887470432	0.913133945	0.956657038	
6	0.945975472	0.939789716	0.943535727	0.955735202	0.968939735
7	0.959844001	0.951658113	0.952946162	0.956180846	0.968479544
8	0.966621431	0.964940799	0.962597695	0.965040221	0.961893731
9	0.973180814	0.970688278	0.966283022	0.970468202	0.963407546
10	0.989488923	0.979160359	0.977817327	0.983031801	0.974397102
11	0.990892218	0.988735758	0.989312392	0.985961874	0.980054533
12	0.993661914	0.993245997	0.995395043	0.992585191	0.99245277
13	0.999774187	0.999354597	0.999142005	0.996348002	0.9961369
14										1.000004282	1.000011215	1.000017761
15										1.000002722	1.000008622	1.000011321
16															1				1
17															1				1
18															1				1
19															1				1
20															1				1

Bert

Joost Buijs · Post by **Joost Buijs** » Tue Nov 01, 2016 19:08

BertTuyt wrote:Joost, herewith a test to optimize IID for the Woldouby Position (FR, 28 Ply).

The optimum seems to be reached for iDepth >= 5, and a IID Search Depth of 3.
Approximately a 11% reduction in node counts.

Code: Select all

	if (iDepth >= 5 && iHashMove == HASHTABLE_ERROR_KEY) {

		int iIIDMove = IIDSearch(pTDepth, iAlpha, iBeta, iPly, 3, bTurn);
			SortMove(false, pTDepth->FirstMove, iIIDMove, iXMoves);	// Sort Moves
	}

See results ( 2 - 20 Search Depth Threshold, 2 - 6 IID Search Depth) and graph.
Table numbers compared with no IID.

Code: Select all

			2				3				4				5				6
2	0.99510533				
3	0.959000212	0.912540206			
4	0.929009267	0.904204653	0.915486984		
5	0.921847617	0.887470432	0.913133945	0.956657038	
6	0.945975472	0.939789716	0.943535727	0.955735202	0.968939735
7	0.959844001	0.951658113	0.952946162	0.956180846	0.968479544
8	0.966621431	0.964940799	0.962597695	0.965040221	0.961893731
9	0.973180814	0.970688278	0.966283022	0.970468202	0.963407546
10	0.989488923	0.979160359	0.977817327	0.983031801	0.974397102
11	0.990892218	0.988735758	0.989312392	0.985961874	0.980054533
12	0.993661914	0.993245997	0.995395043	0.992585191	0.99245277
13	0.999774187	0.999354597	0.999142005	0.996348002	0.9961369
14										1.000004282	1.000011215	1.000017761
15										1.000002722	1.000008622	1.000011321
16															1				1
17															1				1
18															1				1
19															1				1
20															1				1

Bert

Bert,

Thanks for sharing this, it gives me an idea where to start with the parameters for IID.

There is also the question whether to use IID at non-pv nodes or not, at ALL nodes for instance it doesn't seem very useful to me.
ATM I don't have a separate PV and ZW-search but I'm thinking about splitting them up in two separate functions or using a template to circumvent unnecessary conditionals that will slow things down.

Joost

Joost Buijs · Post by **Joost Buijs** » Mon Nov 21, 2016 16:31

Very quiet here lately.

After three weeks of doing other boring stuff I want to continue with my search.

Things I still want to add: draw-recognition, futility-pruning, probcut and I'm looking into singular-extensions but I have no clue if this will help with draughts because in contrary to chess captures are forced.

BTW. does anybody know if there is a computer tournament this year in Culemborg? If so I would possibly like to take a look over there.

BertTuyt · Post by **BertTuyt** » Mon Nov 21, 2016 18:57

Joost, guess no Culemborg tournament this year (also not last year).
But the master of ceremony Leo might have a clue.

Bert

Joost Buijs · Post by **Joost Buijs** » Mon Nov 21, 2016 19:21

BertTuyt wrote:Joost, guess no Culemborg tournament this year (also not last year).
But the master of ceremony Leo might have a clue.

Bert,

Thanks for the info!

I didn't know that there was no tourney last year.
It's been a long time since I last spoke with Leo, probably he doesn't even remember me, but I will send him a mail.

It is a pity that there is almost no activity with regard to computer draughts, this is not very inspiring to say the least. Anyway I will still try to get my program in such a state that it is able to play games via DXP.

Maybe we need something like an internet draughts server (like what ICC, FICS and HGM for chess are doing) to bring back some life to this community.

Joost

BertTuyt · Post by **BertTuyt** » Mon Nov 21, 2016 20:37

Joost, yes there is little activity here.
Or all are working on secret improvements, or all have run out of ideas.
I have implemented (as is) the 8P DB-driver from Ed, and did some experiments with the Woldouby position.
First of all its close to a miracle to see a zero after many plies.

I also tested if different decrement depth methods would yield a difference in number of nodes, before the program finds a draw.
In below table I tested 3 methods.
1) Decrement depth always, 2) Decrement if number of moves > 1, 3) Decrement in case no capture move.

Next to that I included and excluded Internal Iterative Deepening in the case no hashMove was available.
Further no sorting, or killer, or History.

Code: Select all

Method            IID      Nodes         Ply
1) Always         Off	   253.347.095	30
1) Always         On	    206.719.204	30
2) Moves > 1      Off	   210.825.319	22
2) Moves > 1      On	    139.853.005	22
3) No Capture     Off	   248.610.432	20
3) No Capture     On	    138.422.730	20

In addition I added in the evaluation function if ( iXWhiteKing > 0 && iXBlackKing > 0) ) iEvaluation /= 2;

So far I only included DB-scan in the PVSearch and not the QSearch, which I also need to do, and then run tests again.

I also started with ProbCut, but this needs fine tuning (or just switch-off) in the endgame phase (think it works better in the opening and middle).

Bert

Rein Halbersma · Post by **Rein Halbersma** » Mon Nov 21, 2016 21:32

BertTuyt wrote:Joost, yes there is little activity here.
Or all are working on secret improvements, or all have run out of ideas.

Bert, as you know, there is nothing secret to what I am doing https://github.com/rhalbersma/dctl

Also, earlier this month, I contributed some C++11 locking mechanisms to Ed's edgb driver https://github.com/eygilbert/egdb_intl/ ... ional_test

If there's anything you'd like to know, please ask!

I really wished you and Joost and others would just put your programs, or your GUI or anything else that is on Dropbox on GitHub. That makes it a lot easier to exchange ideas, small pieces of code etc.

BertTuyt · Post by **BertTuyt** » Mon Nov 21, 2016 21:40

Rein, thanks for sharing.
I really need to learn the ins and outs of Github and alike.

Bert

World Draughts Forum

Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm