Eval tuning

Krzysztof Grzelak · Post by **Krzysztof Grzelak** » Sun Jan 10, 2016 14:48

Walter, I ask You to make a full version of the program so that the program could play in tournaments.

Walter · Post by **Walter** » Tue Jan 12, 2016 07:35

Rein Halbersma wrote: The old (~1998) version of the Dragon program generated the Dam 2.2/Moby Dam/Scan compatible databases. The source is available from http://hjetten.home.xs4all.nl/DragonDra ... .Win32.zip You could modify the move generator there as well and make 2-6 pc dbs for Killer. These can be plugged in to both Moby Dam and Scan. It would be interesting to modify Moby Dam as well and see if a Killer match between Moby Dam / Scan gives a bigger score difference than a regular draughts match.

I now have a modified Moby Dam as well. An 18 games match (10min/99moves) results in 16 wins for Scan. Scan is much stronger any only needs the Killer-rule 3 times (games 4, 11 and 16). This all assuming I introduced no bugs. I think I need to generate more Killer test positions to test the move gens more extensively.

I also looked at the end game databases (thanks for the link!) but it seems that Scan and Moby are using newer formats. I tried an old Dragon 2.4.1 version that supported Killer and database generation. Just renaming file names is not sufficient. So this needs more time.

Fabien Letouzey · Post by **Fabien Letouzey** » Sun Jan 24, 2016 09:36

Hi all,

I am back after a resting period.

BertTuyt wrote:To get some randomness I used a random shuffle of the root moves, and in the end nodes of the search I also added a small random score (between -2 and 2).

It seems that you wanted to add a safety net but you actually reduced randomness. A random evaluation feature approximates (a tree-weighted form of) mobility when combined with (> 1 ply) search; this is due to the number of children affecting the max operator. The deeper the search, the less random the root move. 4 plies and low granularity should be OK but do you really need this?

From the position set , I only used the positions with equal material, no kings, and no capture or capture thread.

Maybe you considered that material is weighted separately, but it's still the most basic feature. I suggest you start by learning only material instead. For one thing, it's obvious to see when the code works.

After 10.000 iterations the cost function reduced from 0.6931 to 0.6891.

The value should already drop steadily during the first iterations, so there's no need to wait for hours. I would say reaching/passing 0.6 eventually would be a success (with good features), so you are going to notice the difference when it works. Assuming the code and data are correct, your result suggests that the learning rate is too low. Unpredictable data would lead to the same sympoms though ...

Fabien.

Fabien Letouzey · Post by **Fabien Letouzey** » Sun Jan 24, 2016 09:46

Hi Ed,

Fairly impressive that you managed to improve a finely-tuned program so quickly. Adapting to machine learning / statistics is no small feat.

Ed Gilbert wrote:I put a link to kingsrow beta test version 1.57a at the kingsrow download web page. This version uses an eval function that was created automatically using logistic regression. Against the classic kingsrow it scores 0.608 in engine matches. Against dragon it scores essentially equal.

I'm not sure I follow. Are you suggesting that these two matches give inconsistent Elo results? I assume that 0.608 is on a [0, 1] scale and that classic Kingsrow and Dragon were about 30 Elo apart (rather than 75).

Fabien.

Fabien Letouzey · Post by **Fabien Letouzey** » Sun Jan 24, 2016 10:13

Walter wrote:As most of you know I am really interested in a program that would support the Killer-rule. If I have understood past conversations correctly, then this would require 1) modification of move generator, 2) new evaluation function and 3) new endgame databases.

I believe that changing the move generator never was the big issue. Would it be right to say that with several people now working on eval learning that the second hurdle has also become a lot less difficult to overcome?

Am I right in thinking that with eval learning in place it would now take days/weeks instead of months to create a good quality eval specifically for Killer?

I agree with you. Unfortunately there's also 4) re-tune everything (if you want top performance, but the same could be said for 2 and 3). 4 is still manual though, so unlikely to thrill programmers. Even if automated, it would still take months/years.

Fabien Letouzey · Post by **Fabien Letouzey** » Sun Jan 24, 2016 10:34

Rein Halbersma wrote:second hypothesis: a specifically tuned Killer eval is not just better at Killer, but also at regular draughts (because the enormously high drawing percentage of regular draughts does not give enough information to the fitting procedure). This would also apply to humans: study Killer middle games to become better at regular draughts.

That's a single data point, but the killer eval I computed in July is about +30 Elo in killer draughts, but -20 Elo in regular draughts (really a lot due to the much smaller scale). One characteristic is that kings are scored much higher (presumably because they predict winning much better). It's possible that higher eval overall (variance I guess) has side effects that affect the experiment though. It's also possible that king value is the only significant difference. I've just sent the eval to Walter and perhaps he will be able to discern a change in playing style, who knows ...

Worse still, changing search parameters could improve one game at the expense of the other (although usually not by much). So I stopped this way of thinking, which was my only hope to keep interest after Leiden. Even in ultra-fast games I had nearly 90% draws in regular draughts ... I'm now waiting for computer tournaments to adopt the killer rule (or another solution for fewer draws) before I consider working on Scan again.

Ed Gilbert · Post by **Ed Gilbert** » Sun Jan 24, 2016 14:35

Hi Fabien,

Fairly impressive that you managed to improve a finely-tuned program so quickly. Adapting to machine learning / statistics is no small feat.

Thanks. I had months of poor results before I reached something that could even equal the classic eval. It is only because I knew of the results that you and Michel have achieved that I did not abandon it and assume the technique would not work.

I'm not sure I follow. Are you suggesting that these two matches give inconsistent Elo results? I assume that 0.608 is on a [0, 1] scale and that classic Kingsrow and Dragon were about 30 Elo apart (rather than 75).

Yes, [0,1] scale. I did not try to compare elos. The matches were played under different conditions. The kingsrow vs kingsrow matches were played with 6-piece dbs and 7900 games of 1 sec initial time + 0.1 sec increment per move. The kingsrow vs dragon matches were using 8-piece dbs and 1 min/80 move games (~1900 games). I think the result was 0.507 for kingsrow and probably a 0.500 result would be within a reasonable uncertainty band for that number of games.

I have read that matches between 2 different versions of the same program tend to exaggerate the elo difference between them. But maybe that doesn't apply here since the 2 kingsrow versions have completely different evals.

-- Ed

Rein Halbersma · Post by **Rein Halbersma** » Sun Jan 24, 2016 15:04

Fabien Letouzey wrote:
Rein Halbersma wrote:second hypothesis: a specifically tuned Killer eval is not just better at Killer, but also at regular draughts (because the enormously high drawing percentage of regular draughts does not give enough information to the fitting procedure). This would also apply to humans: study Killer middle games to become better at regular draughts.
That's a single data point, but the killer eval I computed in July is about +30 Elo in killer draughts, but -20 Elo in regular draughts (really a lot due to the much smaller scale). One characteristic is that kings are scored much higher (presumably because they predict winning much better). It's possible that higher eval overall (variance I guess) has side effects that affect the experiment though. It's also possible that king value is the only significant difference. I've just sent the eval to Walter and perhaps he will be able to discern a change in playing style, who knows ...

Interesting to hear that you made a killer eval already in July. Why didn't you write about it earlier? Or did I miss a post somewhere?

Also interesting to learn that the king weight is significantly changed wrt regular draughts. I wonder what happens if you transplant the killer pattern weights to the regular eval but also keep the regular material weigths? Does that help a bit to reduce the -20 ELO gap?

Worse still, changing search parameters could improve one game at the expense of the other (although usually not by much). So I stopped this way of thinking, which was my only hope to keep interest after Leiden. Even in ultra-fast games I had nearly 90% draws in regular draughts ... I'm now waiting for computer tournaments to adopt the killer rule (or another solution for fewer draws) before I consider working on Scan again.

That's nice to hear, and I largely agree that draughts at the top level is not interesting anymore otherwise. Are you aware that there will be a top grandmaster tournament in July this year that plays with the killer rules? http://alldraughts.com/index.php/en/sha ... waard-2016

Fabien Letouzey · Post by **Fabien Letouzey** » Sun Jan 24, 2016 16:03

Rein Halbersma wrote:Interesting to hear that you made a killer eval already in July. Why didn't you write about it earlier? Or did I miss a post somewhere?

It was too recent to make it part of the release, and I kept it as "future plans". Also Internet discussions didn't focus on draw rate as much as I expected them to, instead focusing on learning which makes sense.

Also interesting to learn that the king weight is significantly changed wrt regular draughts. I wonder what happens if you transplant the killer pattern weights to the regular eval but also keep the regular material weigths? Does that help a bit to reduce the -20 ELO gap?

I can try that. For now, I'm testing the endgame tables for Walter.

That's nice to hear, and I largely agree that draughts at the top level is not interesting anymore otherwise. Are you aware that there will be a top grandmaster tournament in July this year that plays with the killer rules? http://alldraughts.com/index.php/en/sha ... waard-2016

I learned about it from Walter yesterday. But I expect much more inertia from the programmers for at least two reasons:
- endgame tables, especially 8-piece ones that take so much time to build
- half of the programs seem to have only been lightly updated during the last few years (guess)

Rein Halbersma · Post by **Rein Halbersma** » Wed Jan 27, 2016 22:19

Walter wrote:
Rein Halbersma wrote: The old (~1998) version of the Dragon program generated the Dam 2.2/Moby Dam/Scan compatible databases. The source is available from http://hjetten.home.xs4all.nl/DragonDra ... .Win32.zip You could modify the move generator there as well and make 2-6 pc dbs for Killer. These can be plugged in to both Moby Dam and Scan. It would be interesting to modify Moby Dam as well and see if a Killer match between Moby Dam / Scan gives a bigger score difference than a regular draughts match.
I now have a modified Moby Dam as well. An 18 games match (10min/99moves) results in 16 wins for Scan. Scan is much stronger any only needs the Killer-rule 3 times (games 4, 11 and 16). This all assuming I introduced no bugs. I think I need to generate more Killer test positions to test the move gens more extensively.

I also looked at the end game databases (thanks for the link!) but it seems that Scan and Moby are using newer formats. I tried an old Dragon 2.4.1 version that supported Killer and database generation. Just renaming file names is not sufficient. So this needs more time.

I found this old link again: http://mdgsoft.home.xs4all.nl/draughts/ ... -2.4.1.zip
You need to run the Setup.exe from a cmd shell The Dragon program has a builtin Killer engine that you can find in the Options. It also has a graphical and command line database builder. I'm currently rebuilding the 6 piece endgames for it. I used to have this program but had lost it when I got my new workstation. This Dragon version also has DamExchange support.

Fabien Letouzey · Post by **Fabien Letouzey** » Thu Jan 28, 2016 08:32

Rein Halbersma wrote:Also interesting to learn that the king weight is significantly changed wrt regular draughts. I wonder what happens if you transplant the killer pattern weights to the regular eval but also keep the regular material weigths? Does that help a bit to reduce the -20 ELO gap?

It was terrible: -100/-200 Elo range (with/without man material changed); I stopped both experiments after mere minutes. I don't think any conclusion can be drawn though, as features are related. Material values spread to other features (king PST, man patterns), especially I guess with regularisation.

Rein Halbersma · Post by **Rein Halbersma** » Thu Jan 28, 2016 14:27

Fabien Letouzey wrote:
Rein Halbersma wrote:Also interesting to learn that the king weight is significantly changed wrt regular draughts. I wonder what happens if you transplant the killer pattern weights to the regular eval but also keep the regular material weigths? Does that help a bit to reduce the -20 ELO gap?
It was terrible: -100/-200 Elo range (with/without man material changed); I stopped both experiments after mere minutes. I don't think any conclusion can be drawn though, as features are related. Material values spread to other features (king PST, man patterns), especially I guess with regularisation.

thanks for trying anyway

World Draughts Forum

Eval tuning

Re: Eval tuning

Re: Eval tuning

Re: Eval tuning

Re: Kingsrow with auto tuned eval

Re: Eval tuning

Re: Eval tuning

Re: Kingsrow with auto tuned eval

Re: Eval tuning

Re: Eval tuning

Re: Eval tuning

Re: Eval tuning

Re: Eval tuning