Eval tuning
-
- Posts: 1368
- Joined: Thu Jun 20, 2013 17:16
- Real name: Krzysztof Grzelak
Re: Eval tuning
Walter, I ask You to make a full version of the program so that the program could play in tournaments.
Re: Eval tuning
I now have a modified Moby Dam as well. An 18 games match (10min/99moves) results in 16 wins for Scan. Scan is much stronger any only needs the Killer-rule 3 times (games 4, 11 and 16). This all assuming I introduced no bugs. I think I need to generate more Killer test positions to test the move gens more extensively.Rein Halbersma wrote: The old (~1998) version of the Dragon program generated the Dam 2.2/Moby Dam/Scan compatible databases. The source is available from http://hjetten.home.xs4all.nl/DragonDra ... .Win32.zip You could modify the move generator there as well and make 2-6 pc dbs for Killer. These can be plugged in to both Moby Dam and Scan. It would be interesting to modify Moby Dam as well and see if a Killer match between Moby Dam / Scan gives a bigger score difference than a regular draughts match.
I also looked at the end game databases (thanks for the link!) but it seems that Scan and Moby are using newer formats. I tried an old Dragon 2.4.1 version that supported Killer and database generation. Just renaming file names is not sufficient. So this needs more time.
- Attachments
-
- KillerScan - KillerMoby 10min 99 moves.pdn
- (23.6 KiB) Downloaded 305 times
-
- Posts: 299
- Joined: Tue Jul 07, 2015 07:48
- Real name: Fabien Letouzey
Re: Eval tuning
Hi all,
I am back after a resting period.
Fabien.
I am back after a resting period.
It seems that you wanted to add a safety net but you actually reduced randomness. A random evaluation feature approximates (a tree-weighted form of) mobility when combined with (> 1 ply) search; this is due to the number of children affecting the max operator. The deeper the search, the less random the root move. 4 plies and low granularity should be OK but do you really need this?BertTuyt wrote:To get some randomness I used a random shuffle of the root moves, and in the end nodes of the search I also added a small random score (between -2 and 2).
Maybe you considered that material is weighted separately, but it's still the most basic feature. I suggest you start by learning only material instead. For one thing, it's obvious to see when the code works.From the position set , I only used the positions with equal material, no kings, and no capture or capture thread.
The value should already drop steadily during the first iterations, so there's no need to wait for hours. I would say reaching/passing 0.6 eventually would be a success (with good features), so you are going to notice the difference when it works. Assuming the code and data are correct, your result suggests that the learning rate is too low. Unpredictable data would lead to the same sympoms though ...After 10.000 iterations the cost function reduced from 0.6931 to 0.6891.
Fabien.
-
- Posts: 299
- Joined: Tue Jul 07, 2015 07:48
- Real name: Fabien Letouzey
Re: Kingsrow with auto tuned eval
Hi Ed,
Fairly impressive that you managed to improve a finely-tuned program so quickly. Adapting to machine learning / statistics is no small feat.
Fabien.
Fairly impressive that you managed to improve a finely-tuned program so quickly. Adapting to machine learning / statistics is no small feat.
I'm not sure I follow. Are you suggesting that these two matches give inconsistent Elo results? I assume that 0.608 is on a [0, 1] scale and that classic Kingsrow and Dragon were about 30 Elo apart (rather than 75).Ed Gilbert wrote:I put a link to kingsrow beta test version 1.57a at the kingsrow download web page. This version uses an eval function that was created automatically using logistic regression. Against the classic kingsrow it scores 0.608 in engine matches. Against dragon it scores essentially equal.
Fabien.
-
- Posts: 299
- Joined: Tue Jul 07, 2015 07:48
- Real name: Fabien Letouzey
Re: Eval tuning
I agree with you. Unfortunately there's also 4) re-tune everything (if you want top performance, but the same could be said for 2 and 3). 4 is still manual though, so unlikely to thrill programmers. Even if automated, it would still take months/years.Walter wrote:As most of you know I am really interested in a program that would support the Killer-rule. If I have understood past conversations correctly, then this would require 1) modification of move generator, 2) new evaluation function and 3) new endgame databases.
I believe that changing the move generator never was the big issue. Would it be right to say that with several people now working on eval learning that the second hurdle has also become a lot less difficult to overcome?
Am I right in thinking that with eval learning in place it would now take days/weeks instead of months to create a good quality eval specifically for Killer?
-
- Posts: 299
- Joined: Tue Jul 07, 2015 07:48
- Real name: Fabien Letouzey
Re: Eval tuning
That's a single data point, but the killer eval I computed in July is about +30 Elo in killer draughts, but -20 Elo in regular draughts (really a lot due to the much smaller scale). One characteristic is that kings are scored much higher (presumably because they predict winning much better). It's possible that higher eval overall (variance I guess) has side effects that affect the experiment though. It's also possible that king value is the only significant difference. I've just sent the eval to Walter and perhaps he will be able to discern a change in playing style, who knows ...Rein Halbersma wrote:second hypothesis: a specifically tuned Killer eval is not just better at Killer, but also at regular draughts (because the enormously high drawing percentage of regular draughts does not give enough information to the fitting procedure). This would also apply to humans: study Killer middle games to become better at regular draughts.
Worse still, changing search parameters could improve one game at the expense of the other (although usually not by much). So I stopped this way of thinking, which was my only hope to keep interest after Leiden. Even in ultra-fast games I had nearly 90% draws in regular draughts ... I'm now waiting for computer tournaments to adopt the killer rule (or another solution for fewer draws) before I consider working on Scan again.
-
- Posts: 859
- Joined: Sat Apr 28, 2007 14:53
- Real name: Ed Gilbert
- Location: Morristown, NJ USA
- Contact:
Re: Kingsrow with auto tuned eval
Hi Fabien,
I have read that matches between 2 different versions of the same program tend to exaggerate the elo difference between them. But maybe that doesn't apply here since the 2 kingsrow versions have completely different evals.
-- Ed
Thanks. I had months of poor results before I reached something that could even equal the classic eval. It is only because I knew of the results that you and Michel have achieved that I did not abandon it and assume the technique would not work.Fairly impressive that you managed to improve a finely-tuned program so quickly. Adapting to machine learning / statistics is no small feat.
Yes, [0,1] scale. I did not try to compare elos. The matches were played under different conditions. The kingsrow vs kingsrow matches were played with 6-piece dbs and 7900 games of 1 sec initial time + 0.1 sec increment per move. The kingsrow vs dragon matches were using 8-piece dbs and 1 min/80 move games (~1900 games). I think the result was 0.507 for kingsrow and probably a 0.500 result would be within a reasonable uncertainty band for that number of games.I'm not sure I follow. Are you suggesting that these two matches give inconsistent Elo results? I assume that 0.608 is on a [0, 1] scale and that classic Kingsrow and Dragon were about 30 Elo apart (rather than 75).
I have read that matches between 2 different versions of the same program tend to exaggerate the elo difference between them. But maybe that doesn't apply here since the 2 kingsrow versions have completely different evals.
-- Ed
-
- Posts: 1722
- Joined: Wed Apr 14, 2004 16:04
- Contact:
Re: Eval tuning
Interesting to hear that you made a killer eval already in July. Why didn't you write about it earlier? Or did I miss a post somewhere?Fabien Letouzey wrote:That's a single data point, but the killer eval I computed in July is about +30 Elo in killer draughts, but -20 Elo in regular draughts (really a lot due to the much smaller scale). One characteristic is that kings are scored much higher (presumably because they predict winning much better). It's possible that higher eval overall (variance I guess) has side effects that affect the experiment though. It's also possible that king value is the only significant difference. I've just sent the eval to Walter and perhaps he will be able to discern a change in playing style, who knows ...Rein Halbersma wrote:second hypothesis: a specifically tuned Killer eval is not just better at Killer, but also at regular draughts (because the enormously high drawing percentage of regular draughts does not give enough information to the fitting procedure). This would also apply to humans: study Killer middle games to become better at regular draughts.
Also interesting to learn that the king weight is significantly changed wrt regular draughts. I wonder what happens if you transplant the killer pattern weights to the regular eval but also keep the regular material weigths? Does that help a bit to reduce the -20 ELO gap?
That's nice to hear, and I largely agree that draughts at the top level is not interesting anymore otherwise. Are you aware that there will be a top grandmaster tournament in July this year that plays with the killer rules? http://alldraughts.com/index.php/en/sha ... waard-2016Worse still, changing search parameters could improve one game at the expense of the other (although usually not by much). So I stopped this way of thinking, which was my only hope to keep interest after Leiden. Even in ultra-fast games I had nearly 90% draws in regular draughts ... I'm now waiting for computer tournaments to adopt the killer rule (or another solution for fewer draws) before I consider working on Scan again.
-
- Posts: 299
- Joined: Tue Jul 07, 2015 07:48
- Real name: Fabien Letouzey
Re: Eval tuning
It was too recent to make it part of the release, and I kept it as "future plans". Also Internet discussions didn't focus on draw rate as much as I expected them to, instead focusing on learning which makes sense.Rein Halbersma wrote:Interesting to hear that you made a killer eval already in July. Why didn't you write about it earlier? Or did I miss a post somewhere?
I can try that. For now, I'm testing the endgame tables for Walter.Also interesting to learn that the king weight is significantly changed wrt regular draughts. I wonder what happens if you transplant the killer pattern weights to the regular eval but also keep the regular material weigths? Does that help a bit to reduce the -20 ELO gap?
I learned about it from Walter yesterday. But I expect much more inertia from the programmers for at least two reasons:That's nice to hear, and I largely agree that draughts at the top level is not interesting anymore otherwise. Are you aware that there will be a top grandmaster tournament in July this year that plays with the killer rules? http://alldraughts.com/index.php/en/sha ... waard-2016
- endgame tables, especially 8-piece ones that take so much time to build
- half of the programs seem to have only been lightly updated during the last few years (guess)
-
- Posts: 1722
- Joined: Wed Apr 14, 2004 16:04
- Contact:
Re: Eval tuning
I found this old link again: http://mdgsoft.home.xs4all.nl/draughts/ ... -2.4.1.zipWalter wrote:I now have a modified Moby Dam as well. An 18 games match (10min/99moves) results in 16 wins for Scan. Scan is much stronger any only needs the Killer-rule 3 times (games 4, 11 and 16). This all assuming I introduced no bugs. I think I need to generate more Killer test positions to test the move gens more extensively.Rein Halbersma wrote: The old (~1998) version of the Dragon program generated the Dam 2.2/Moby Dam/Scan compatible databases. The source is available from http://hjetten.home.xs4all.nl/DragonDra ... .Win32.zip You could modify the move generator there as well and make 2-6 pc dbs for Killer. These can be plugged in to both Moby Dam and Scan. It would be interesting to modify Moby Dam as well and see if a Killer match between Moby Dam / Scan gives a bigger score difference than a regular draughts match.
I also looked at the end game databases (thanks for the link!) but it seems that Scan and Moby are using newer formats. I tried an old Dragon 2.4.1 version that supported Killer and database generation. Just renaming file names is not sufficient. So this needs more time.
You need to run the Setup.exe from a cmd shell The Dragon program has a builtin Killer engine that you can find in the Options. It also has a graphical and command line database builder. I'm currently rebuilding the 6 piece endgames for it. I used to have this program but had lost it when I got my new workstation. This Dragon version also has DamExchange support.
-
- Posts: 299
- Joined: Tue Jul 07, 2015 07:48
- Real name: Fabien Letouzey
Re: Eval tuning
It was terrible: -100/-200 Elo range (with/without man material changed); I stopped both experiments after mere minutes. I don't think any conclusion can be drawn though, as features are related. Material values spread to other features (king PST, man patterns), especially I guess with regularisation.Rein Halbersma wrote:Also interesting to learn that the king weight is significantly changed wrt regular draughts. I wonder what happens if you transplant the killer pattern weights to the regular eval but also keep the regular material weigths? Does that help a bit to reduce the -20 ELO gap?
-
- Posts: 1722
- Joined: Wed Apr 14, 2004 16:04
- Contact:
Re: Eval tuning
thanks for trying anywayFabien Letouzey wrote:It was terrible: -100/-200 Elo range (with/without man material changed); I stopped both experiments after mere minutes. I don't think any conclusion can be drawn though, as features are related. Material values spread to other features (king PST, man patterns), especially I guess with regularisation.Rein Halbersma wrote:Also interesting to learn that the king weight is significantly changed wrt regular draughts. I wonder what happens if you transplant the killer pattern weights to the regular eval but also keep the regular material weigths? Does that help a bit to reduce the -20 ELO gap?