Feike Boomstra 's Horizon Draughts Program

Ed Gilbert · Post by **Ed Gilbert** » Fri Jan 28, 2011 13:35

BertTuyt wrote:Just a question for the frequent readers here, interested in a future windows copy of Horizon.
So far the version used is 64bit and optimized for bit handling instructions (like pop count).
Is this a problem for you, based on the OS and processor you use??

I don't have any computers that have hardware popcount. You can use the MS __cpuid() intrinsic to discover if the processor supports popcount at startup, and set a global boolean. http://msdn.microsoft.com/en-us/library ... S.90).aspx Write an inline popcount() function that tests the global and either uses the hardware version or a software emulation. The test will be negligible overhead because the branch will always be predicted correctly.

For 32-bit vs. 64-bit, I think most people are running 32-bit Windows. In kingsrow there are very few places where the code has to be different between 32-bit and 64-bit compiles, so it isn't much work to make 2 separate build targets. MSVC++ automatically defines a _WIN64 macro when compiling for 64-bit, so you can bracket those places in the code with #ifdef _WIN64.

-- Ed

BertTuyt · Post by **BertTuyt** » Fri Jan 28, 2011 22:46

Thanks for the feedback so far.
I will (at least) remove the intrinsics which will make use of specific Intel instructions.
Also i will check if I have problems porting the program to 32bits.
For the shorter term i will not (yet) make Horizon platform independent (so also including a Unix version), as I really want to focus on the strength of the program.
As you all know (by know), my intention is to beat TRUUS with Horizon in a 158 game match.

Keep you all posted,

Bert

BertTuyt · Post by **BertTuyt** » Sat Feb 05, 2011 13:39

Before I, in a later stage, will continue with the search routine, I want now to start with the evaluation function discussion.
Basically the eval is very well structured, which is reflected in the logical way (in the end) the score is calculated. See below code:

Code: Select all

	white = wcount_item(white_endgame, 0) +
			wcount_item(node->nr_of_white_man * 1000, 1) +
			wcount_item(node->nr_of_white_king * white_king_value, 2) +
			wcount_item(white_man_position * man_position_contribution, 3) +
			wcount_item(white_king_position * king_position_contribution, 4) +
			wcount_item(white_centrum * centrum_contribution, 5) +
			wcount_item(white_man_locked_bonus, 6) +
			wcount_item(white_avoid_4641, 7) +
			wcount_item(white_avoid_2324, 8) +
			wcount_item(white_avoid_2335, 9) +
			wcount_item(white_2_8_13, 10) +
			wcount_item(white_253035, 11) +
			wcount_item(white_klassiek, 12) +
			wcount_item(white_tempi_bonus * tempi_contribution, 13) +
			wcount_item(white_ketting_stelling * 400, 14) +
			wcount_item(white_ketting_stelling_2, 15) +
			wcount_item(white_halve_hek, 16) +
			wcount_item(white_active_15, 17) +
			wcount_item(white_slechte_binding * binding_contribution, 18) +
			wcount_item(white_free_path, 19) +
			wcount_item(white_fit * fit_contribution, 20) +
			wcount_item(white_corner_35, 21) +
			wcount_item(white_corner_36, 22) +
			wcount_item(white_11_16_17, 23) +
			wcount_item(white_voorpost, 24) +
			wcount_item(white_locked, 25) +
			wcount_item(white_vleugel_opsluiting, 26) +
			wcount_item(white_tandem, 27) +
			wcount_item(white_mobility * mobility_contribution, 28) +
			wcount_item(white_edge * edge_contribution, 29) +
			wcount_item(white_canon, 30);

	black = bcount_item(black_endgame, 0) +
			bcount_item(node->nr_of_black_man * 1000, 1) +
			bcount_item(node->nr_of_black_king * black_king_value, 2) +
			bcount_item(black_man_position * man_position_contribution, 3) +
			bcount_item(black_king_position * king_position_contribution, 4) +
			bcount_item(black_centrum * centrum_contribution, 5) +
			bcount_item(black_man_locked_bonus, 6) +
			bcount_item(black_avoid_4641, 7) +
			bcount_item(black_avoid_2324, 8) +
			bcount_item(black_avoid_2335, 9) +
			bcount_item(black_2_8_13, 10) +
			bcount_item(black_253035, 11) +
			bcount_item(black_klassiek, 12) +
			bcount_item(black_tempi_bonus * tempi_contribution, 13) +
			bcount_item(black_ketting_stelling * 400, 14) +
			bcount_item(black_ketting_stelling_2, 15) +
			bcount_item(black_halve_hek, 16) +
			bcount_item(black_active_15, 17) +
			bcount_item(black_slechte_binding * binding_contribution, 18) +
			bcount_item(black_free_path, 19) +
			bcount_item(black_fit * fit_contribution, 20) +
			bcount_item(black_corner_35, 21) +
			bcount_item(black_corner_36, 22) +
			bcount_item(black_11_16_17, 23) +
			bcount_item(black_voorpost, 24) +
			bcount_item(black_locked, 25) +
			bcount_item(black_vleugel_opsluiting, 26) +
			bcount_item(black_tandem, 27) +
			bcount_item(black_mobility * mobility_contribution, 28) +
			bcount_item(black_edge * edge_contribution, 29) +
			bcount_item(black_canon, 30);

So the score is constructed from 30 elements, as endgame is always 0. When a DB-score is detected the eval reurns already in an earlier stage.

I hope the non-Dutch readers are able to understand the meaning of some of the evaluation elements.
The wcount_item and bcount_item are defined as:

Code: Select all

#define wcount_item(item, indx) item
#define bcount_item(item, indx) item

So relatively straightforwarded, via this approach it is also possible to use different weights for the evaluation elements, which is not done in this case (yet).

The score or corrected for Kings on both sides or potential breakthroughts (think most/more program apply a similar correction mechanism).

Code: Select all

	/* decrease score if there are kings or nearly kings at both sides */
	if (((node->nr_of_white_king > 0) || (white_free_path > 0)) && ((node->nr_of_black_king > 0) || (black_free_path > 0))) {
		white = white / 4;
		black = black / 4;
	}

And last and least (

) the score is modified for whether the node is a max- or min-node, and granularity ( #define GRANULARITY 20 ).
I did not check if this last piece of code for correctness (yet)....

Code: Select all

	if (node->status & IS_MAX_NODE)
	{
		if (node->status & IS_WHITE_ON_MOVE){
			if (white - black >= 0)
				score =	(short int) (( white - black + GRANULARITY/2)/GRANULARITY)*GRANULARITY ;
			else
				score =	(short int) (( white - black - GRANULARITY/2)/GRANULARITY)*GRANULARITY ;
		}
		else {
			if (black - white >= 0)
				score =	(short int) (( black - white + GRANULARITY/2)/GRANULARITY)*GRANULARITY ;
			else
				score =	(short int) (( black - white - GRANULARITY/2)/GRANULARITY)*GRANULARITY ;
		}
	}
	else // min-node
	{
		if (node->status & IS_WHITE_ON_MOVE){		// remember min-node, this is the opponents color
			if (black - white >= 0)
				score =	(short int) (( black - white + GRANULARITY/2)/GRANULARITY)*GRANULARITY ;
			else
				score =	(short int) (( black - white - GRANULARITY/2)/GRANULARITY)*GRANULARITY ;
		}
		else{
			if (white - black >= 0)
				score =	(short int) (( white - black + GRANULARITY/2)/GRANULARITY)*GRANULARITY ;
			else
				score =	(short int) (( white - black - GRANULARITY/2)/GRANULARITY)*GRANULARITY ;
		}
	}

Some futher details will be discussed in future blogs.

Keep you all posted,

Bert

BertTuyt · Post by **BertTuyt** » Sun Feb 06, 2011 00:13

I tested Damage with a bare evaluation function based on material only with the Horizon program/routine.
Matches were played with 158games each.
Fixed ply for both sides (Horizon always 10 Ply).
Next to the evaluation Horizon had a 6P DB, whereas Damage used the 7P DB.
Results see below:

Code: Select all


10 Ply: 0+ 118- 5=  35?
12 Ply: 0+ 121- 11= 26?
14 Ply: 0+ 113- 11= 34?

The match with the 16 Ply Damage search is still going on.
I will examine the ?results and incorporate them in the final results.
Anyway it seems that with a dumb eval the program does not benefit from deeper search.

In the next rounds I want to make the eval more intelligent , and examine the result of this.
Likely candidates are breakthrough and voorpost (does some-one has a clever English name for this).
I'm very interested to see the effect.
Any other suggestions ?

From the fixed depth matches i found out that the Damage evaluation is better compared with the Horizon eval.
But to be honest browsing through all code, I don't have a clue, where the difference is really made.
The eval was designed long ago (when search depth was 6 ply), so many of the things were related to correct problems at this low level ply.
In the mean time we have grown to 16 ply and beyond, the eval has been modified a zillion times, but to be honest I no longer can track all changes to specific board positions.
But guess I'm the only one, as other programmers have most likely full documented routines with huge comments, which we all understand in detail

Anyway keep you posted.

Bert

TAILLE · Post by **TAILLE** » Sun Feb 06, 2011 10:58

Hi Bert,

BertTuyt wrote:I tested Damage with a bare evaluation function based on material only with the Horizon program/routine.
Matches were played with 158games each.
Fixed ply for both sides (Horizon always 10 Ply).
Bert

What do you mean by "fixed ply". We all know perfectly that due to the extension-reduction mechanism some program choose a rather "large" approach (quiet extension-reduction) and other choose a "deep" approach (agressive reduction/pruning) ? For Damy it would be a non-sense to stop the search at a fixed depth because I choose a "deep" approach. In addition, seeing my implementation it would be very difficult for me to stop Damy for extending the search in order to find a breakthrough.

BertTuyt wrote:Likely candidates are breakthrough and voorpost (does some-one has a clever English name for this).
Bert

Everytime I see the word "voorpost" I translate it in my mind in "outpost". If no other proposal may be we can decide to use this english word.

Ed Gilbert · Post by **Ed Gilbert** » Sun Feb 06, 2011 13:51

BertTuyt wrote:I tested Damage with a bare evaluation function based on material only with the Horizon program/routine.
Matches were played with 158games each.
Fixed ply for both sides (Horizon always 10 Ply).
Next to the evaluation Horizon had a 6P DB, whereas Damage used the 7P DB.
Results see below:
Code: Select all
10 Ply: 0+ 118- 5=  35?
12 Ply: 0+ 121- 11= 26?
14 Ply: 0+ 113- 11= 34?

Bert, I am confused about what this test is and what the purpose is. Can you explain? Are both damage and horizon modified to use only material in eval, and both search to a fixed depth? Does that mean no quiescent extensions of any kind, not even extending captures? And those scores are from the point of view of horizon, so horizon vs. damage, both searching to 10 plies fixed depth, both with the same material only eval, gives the result +0, -118? That makes no sense. If both horizon and damage are using the same eval, and searching the same fixed depth tree, then they should be equivalent. Move ordering does not matter, nor does speed, node counts, or almost anything else because of the fixed depth and same evals. I must be misunderstanding something.

-- Ed

BertTuyt · Post by **BertTuyt** » Sun Feb 06, 2011 14:57

Ed/Gerard,

I think I need to explain in more detail.

I modified Horizon slightly , so it has no depth-extension (which the Program only checks at depth = 0 ).
So I only continue when at depth <= 0 the side to move has a capture.
In Horizon I kept the same evaluation.
In Damage I can switch of all pruning/extention mechanism, with the exception of the depth <= 0 and side to move has to capture.
So in this way both programs have similar game-trees depths.

In Damage I only based the eval on material only so a man = 100 points (arbitrary value) and a king 320 points.
Next to that I have a 6P DB for Horizon and 7P for Damage.
But in most cases I believe the use of the 7P Damage DB is only to confirm that the positon is lost, when it is too late.

I use fixed depth to really check the impact of the evaluation, otherwise for heavyweight evaluations the search depth reduces.
Although this is a given in a normal time-constraint tournament I wanted to test the search - evaluation in somewhat isolation.

The scores are from the perspective of Damage , so from the program with only the material score.
In the tests I increased only the search-depth for Damage.
The 16P Damage - 10P Horizon match I need to redo, as I had a Damage crash at night.

I will later add some common evaluation elements like outpost (

) , and breakthrough.

Basically I want to better understand what makes sense in my evaluation and what not.
More and more I believe (think Gerard has the same opinion) that we put far to much code in our evaluation, and that some evaluation elements should be tackled by the search itself.

Maybe Ed we could do a fixed depth match between Kingsrow and Damage (but then we must both exclude all pruning/extention mechanisms).
You can already do a fixed depth test with the Horizon evaluation, as it is also built in your Hybrid.

I will maybe later the day post the number of code-lines which i have for the Damage evaluation , and will provide the figure for Horizon.
Think the evaluation sizes of many programs really differ a lot.

I know that the evaluation of Truus is extermely heavy.
In the email exchange I had with Stef in the past he explained me this (but forgot the size of the eval). Also in a publication he mentioned that TRUUS was spending most of "her" time (think 80%) evaluation positions.

Maybe that explains the strength of his program in the 1990 - 2000 period (when search depth was between 4 - 8 ).
When you see matches you see that TRUUS search-depth does not scale as well compared with Damage and Kingsrow.
From Adrie (Flits) I know that his evaluation is not super heavy, but he spent much time in all kinds of breakthrough positions.
Also I know that Flits has a far better search-function (as Adrie shared his code with me

).

So in line with Gerard, I wanted to get more evidence that we should only incoporate the necessary elements in the evaluation (but need to understand which are the vital few).
Remove the rest, and then focus on effective search-depth.....
Also a side effect of a hyper detailed evaluation is that the program tends to find a different line when the score is only 1 point better (so reduced search efficiency).
I know by the way, that you can bypass this by introducing granularity.

Hope this helps...

Bert

BertTuyt · Post by **BertTuyt** » Sun Feb 06, 2011 15:25

Did a quick count of the number of lines (in total) in my evaluation .cpp file (so excluding header file, and excluding the routines related to Endgame DB handling).
The total is 14385 lines

And I'm sure somewhere is a bug, ans some code is most likely obsolete.
But finding these relevant parts ......

Bert

Ed Gilbert · Post by **Ed Gilbert** » Sun Feb 06, 2011 15:37

Bert, I think you will have a problem measuring differences due to the small number of games. In another thread I posted this output from Bayeselo:

Code: Select all

Rank Name       Elo    +    - games score oppo. draws
   1 kingsrow     2    4    4  7904   51%    -2   90%
   2 baseline    -2    4    4  7904   49%     2   90%

I think what this says is that in a match of 7904 games, in order to get a 95% confidence level in LOS (likelihood of superiority), there needs to be a 4 elo difference between the two opponents. That's what the "+" and "-" columns are supposed to show, the elo difference for 95% confidence of LOS.

Said another way, you cannot measure a difference between two opponents of less than 4 elo with 95% confidence level with 7904 games, at least that is the case in a match like this one where there were 90% draws. Since you are trying to measure elo differences that might be in this ballpark, your number of games seems much too small in the fixed depth experiment to be statistically significant. Until recently when I learned about Bayeselo I was also trying to measure small elo differences using insufficient number of games.

-- Ed

TAILLE · Post by **TAILLE** » Sun Feb 06, 2011 15:42

BertTuyt wrote: Basically I want to better understand what makes sense in my evaluation and what not.
More and more I believe (think Gerard has the same opinion) that we put far to much code in our evaluation, and that some evaluation elements should be tackled by the search itself.
Bert

I do not think we put far to much code in our evaluation but I think we do not put what is really relevant. I think that some evaluation elements (tactical elements, breakthrough and outpost) should be tackled by a search mechanism.
Concerning the other elements of the evaluation function I am working on a "lazy" evaluation in order to calculate the "complete" evaluation function only if necessary. As soon as you decide that a complete evaluation has to be done my feeling is that you have to concentrate your evaluation function only on long term aspects of the position but that cannot be done without a lot of code.

White to move

In situation like the above position, if the evaluation function is not able to tell you which is the better strategic move between 42-38 and 43-38 then you will never be able to avoid acculating small disadvantages that can lead to difficulties later in the game.

BertTuyt · Post by **BertTuyt** » Sun Feb 06, 2011 15:54

Ed,

i more and more recognize that if we really want to determine with great accuracy differences, then we need to play more games.
So on my thing to do list is to implement the 3-move ballot matches.

That's also the issue I have with the yearly tournament, it is fun from a social point of view, but i guess that we can also use a random generator to determine the winner for that year.
I did not do the math, but are you able to predict the likelihood of a winner , based on the ELO-rating differences?
I'm not sure, last year I guess Damage had the highest rating, but it could be that if we would play the tournament a zilion times, that in average I would have a <= 50% change in winning.

The good news

The games I played with material only are so lob-sided that I think with great accuracy and high probabbility we can state that the material only program is weaker.
Could you determine based on the program which you have elo differences for the matches i played (when ignorging for the time being the ? results)?
Think I need to get this elo program also and learn to use it

Bert

BertTuyt · Post by **BertTuyt** » Sun Feb 06, 2011 16:00

Gerard,

I'm still struggling what to put in or out the evaluation.
Anyway think your approach is really interesting, and in several areas different.
So I'm also really looking forward to the Kingsrow - Damy match and Damy -Damage

A pity we did not conclude on the protocol discussion, as I really hope that some day we can do full automatic engine tournaments.
Although I don't know to what extend the chess protocols really offer advantages, still need to put more energy in these discussions, nor if there are some really good chess GUI's out there (i know that we several draughts GUI's where we can relatively easy implement any protocol we need), but if there is really a huge working code-base available for a draughts-like-tournament server, then I'm really interested to get the thing working (some day).

Bert

Ed Gilbert · Post by **Ed Gilbert** » Sun Feb 06, 2011 16:55

The games I played with material only are so lob-sided that I think with great accuracy and high probabbility we can state that the material only program is weaker.
Could you determine based on the program which you have elo differences for the matches i played (when ignorging for the time being the ? results)?

Yes those results are quite lopsided, so maybe you can have a high LOS, but (I think) what you are trying to do is measure small differences between 2 lopsided match results. The difficulty is in measuring that small difference.

-- Ed

Ed Gilbert · Post by **Ed Gilbert** » Sun Feb 06, 2011 16:58

In situation like the above position, if the evaluation function is not able to tell you which is the better strategic move between 42-38 and 43-38 then you will never be able to avoid acculating small disadvantages that can lead to difficulties later in the game.

Gerard, it's clear that you understand draughts strategy better than me, as I don't see much difference between those 2 moves. Can you explain? Thanks.

-- Ed

BertTuyt · Post by **BertTuyt** » Sun Feb 06, 2011 17:18

Ed,

I had the same question to Gerard.
As computers get ultra fast, and we are able to play zillions of games in a relatively short time interval.
Why don't we focus in the future in algorithms which learn by playing?
Humans try to understand these complex games by clustering patterns, but could it be that this way we always force to use a well-known subspace of the Draughts-universe.
Maybe we should play into positions unknown and outside the scope of what humans know.
But I also have to admit to automatic learning in our domain has limited success stories so far.

Nevertheless I would welcome Gerard's explanation of the 2 moves considered in the mentioned position, so as a non draughts player I also can learn here

Bert

World Draughts Forum

Feike Boomstra 's Horizon Draughts Program

Re: Feike Boomstra 's Horizon Draughts Program

Re: Feike Boomstra 's Horizon Draughts Program

Re: Feike Boomstra 's Horizon Draughts Program

Re: Feike Boomstra 's Horizon Draughts Program

Re: Feike Boomstra 's Horizon Draughts Program

Re: Feike Boomstra 's Horizon Draughts Program

Re: Feike Boomstra 's Horizon Draughts Program

Re: Feike Boomstra 's Horizon Draughts Program

Re: Feike Boomstra 's Horizon Draughts Program

Re: Feike Boomstra 's Horizon Draughts Program

Re: Feike Boomstra 's Horizon Draughts Program

Re: Feike Boomstra 's Horizon Draughts Program

Re: Feike Boomstra 's Horizon Draughts Program

Re: Feike Boomstra 's Horizon Draughts Program

Re: Feike Boomstra 's Horizon Draughts Program