Ed/Gerard,
I think I need to explain in more detail.
I modified Horizon slightly , so it has no depth-extension (which the Program only checks at depth = 0 ).
So I only continue when at depth <= 0 the side to move has a capture.
In Horizon I kept the same evaluation.
In Damage I can switch of all pruning/extention mechanism, with the exception of the depth <= 0 and side to move has to capture.
So in this way both programs have similar game-trees depths.
In Damage I only based the eval on material only so a man = 100 points (arbitrary value) and a king 320 points.
Next to that I have a 6P DB for Horizon and 7P for Damage.
But in most cases I believe the use of the 7P Damage DB is only to confirm that the positon is lost, when it is too late.
I use fixed depth to really check the impact of the evaluation, otherwise for heavyweight evaluations the search depth reduces.
Although this is a given in a normal time-constraint tournament I wanted to test the search - evaluation in somewhat isolation.
The scores are from the perspective of Damage , so from the program with only the material score.
In the tests I increased only the search-depth for Damage.
The 16P Damage - 10P Horizon match I need to redo, as I had a Damage crash at night.
I will later add some common evaluation elements like outpost (

) , and breakthrough.
Basically I want to better understand what makes sense in my evaluation and what not.
More and more I believe (think Gerard has the same opinion) that we put far to much code in our evaluation, and that some evaluation elements should be tackled by the search itself.
Maybe Ed we could do a fixed depth match between Kingsrow and Damage (but then we must both exclude all pruning/extention mechanisms).
You can already do a fixed depth test with the Horizon evaluation, as it is also built in your Hybrid.
I will maybe later the day post the number of code-lines which i have for the Damage evaluation , and will provide the figure for Horizon.
Think the evaluation sizes of many programs really differ a lot.
I know that the evaluation of Truus is extermely heavy.
In the email exchange I had with Stef in the past he explained me this (but forgot the size of the eval). Also in a publication he mentioned that TRUUS was spending most of "her" time (think 80%) evaluation positions.
Maybe that explains the strength of his program in the 1990 - 2000 period (when search depth was between 4 - 8 ).
When you see matches you see that TRUUS search-depth does not scale as well compared with Damage and Kingsrow.
From Adrie (Flits) I know that his evaluation is not super heavy, but he spent much time in all kinds of breakthrough positions.
Also I know that Flits has a far better search-function (as Adrie shared his code with me

).
So in line with Gerard, I wanted to get more evidence that we should only incoporate the necessary elements in the evaluation (but need to understand which are the vital few).
Remove the rest, and then focus on effective search-depth.....
Also a side effect of a hyper detailed evaluation is that the program tends to find a different line when the score is only 1 point better (so reduced search efficiency).
I know by the way, that you can bypass this by introducing granularity.
Hope this helps...
Bert