Ed Gilbert wrote:A very good question, one that many of us are scratching our heads to find the answer. In the case of kingsrow, after watching many match games, and running some other experiments, I think its the eval. Kingsrow does not use an auto tuned eval. In many of kingsrow's losses, scan is reporting high scores that kingrow is not, for many moves. It's quite remarkable to me that mostly just a set of patterns of 8 or 9 squares can see so much. In particular, I see no weakness in scan's handling of outpost squares (white 24, black 27). Many programs (kingsrow included) have had weaknesses in evaluating a man on these squares, and would lose games until a complicated eval was added that takes into consideration all of the attacking and defending squares behind the outpost. A pattern of 8 or 9 squares doesn't come close to seeing the whole picture. But somehow it works.
Using patterns was a kind of a bet for me as I was unaware of Dragon already having them: do draughts have a similar structure as Othello? The reasoning was that because pieces move so slowly (compared to chess), local features might suffice. The answer appears to be "yes", though I have an arguably funny anecdote about this. The Scan learning algorithm says that it extracted about 20% of the data "knowledge". By comparison, a similar algorithm in Othello could extract 90% of the exact score (not just WLD)! So my first reaction when I saw the 20% (actually at the beginning it was less) is that the project was a failure and I nearly abandoned it (LOL). That was linear vs. logistic regressions though and the games were generated differently.
One puzzle piece is that patterns might be able to guess content outside their reach through correlation (and game phase). I think that rule constraints in Othello and to a lesser extent in draughts make this possible. Even without guessing, predicting an "average outcome" is probably much better than ignoring a feature entirely.
For complicated features, my wishful thinking is that patterns recognise the easy cases and let search (through ProbCut) focus on the unclear ones. In this view statistical evaluation gains a few plies everywhere, not just for outposts or other specific evaluation terms.
During the tournament I was uneasy with outposts. For instance in the TD-King game, I was glad when it was exchanged. That might not have been justified though.
Now regarding King's Row. The fact that it isn't using statistical evaluation makes its overall performance all the more impressive! I guess a lot of tuning can compensate for that, at the cost of making future changes more difficult because of local optimality.
Kingsrow uses lmr and a pruning mechanism that is different than mpc. I tried replacing its pruning mechanism with scan's mpc, and didn't see any appreciable difference in depth searched. My lmr parameters are less aggressive than scan's, so I tried scan's. With this change kingsrow's nominal search depth was deeper, but average leaf depth didn't really change much. In general scan's nominal search depth is deeper than kingsrow, but average leaf depths are a lot closer. I then ran an engine match between this new test version of kingsrow using scan's lmr and mpc against a baseline kingsrow. The baseline code was better.
It's only a guess but I would say that ProbCut relies on eval properties (such as correlation between shallow and deep searches) whereas LMR depends more on move ordering. So in your case, it would be easier to experiment with the latter (if I'm correct).
I browsed the scan sources a bit and was curious about a couple of things, maybe you could comment on them. I didn't see any use of internal iterative deepening to get a good move for move ordering when there is no hashtable move. I'm sure I tested IID and found an improvement in node counts, and I thought it was kind of a standard technique in chess programs. Did you try it and find it didn't work well in scan?
I could not measure an impact during Senpai's development and therefore put IID as "low priority" on my list for Scan. I haven't tried it. Furthermore I didn't write any tools to compare tree size (or TTD).
Also, if I'm reading your history table code correctly, your changes in history table values for good moves and bad moves do not seem to have a depth component. I thought most chess programs use much bigger increments for deep interior nodes than for leaf nodes. I know I found this was effective in kings row.
There are several non-pragmatic reasons. One is that, for me, history is only for low depth where the TT doesn't help much. Another, though I'm not exploiting it, is that I want my history scores to have a global interpretation: in this case a probability estimate. I think in Senpai what kind of update I was using didn't seem to matter much. I should have been more careful for Scan though because in chess we have a lot of candidates before we call history for help.