AlphaZero-style Breakthrough program

Fabien Letouzey · Post by **Fabien Letouzey** » Sun May 06, 2018 17:42

Hi all,

Last month Richard Lorentz (http://www.csun.edu/~lorentz/), author of several game programs including one for Amazons, contacted me about the game of Breakthrough: https://en.m.wikipedia.org/wiki/Breakth ... oard_game)
This game, that I will call BT here, was the inspiration behind my (half-joking) proposal to apply its principle to draughts (draughts-BT).

Richard wanted to talk to me about a new program, "gzero" (G0), that played so well that it won all its games against strong opposition. As I understand, G0 is a clone of AlphaZero (A0) with a domain-specific language you use to describe your game. It is fully automated: you enter the rules of a game, and then you wait, and then a strong program emerges after an indeterminate period (depending on the game). You can find the source code here: https://github.com/ggplib/ggp-zero
Furthermore, G0 only spent a few days of learning (for BT), compared to many years for A0 (if they used a normal computer like those we can afford).

Anyway, G0 is so strong in BT that Richard couldn't even estimate its level. So he asked me to convert Scan to BT, a possibility I mentioned to him last year, and match the resulting program with G0. That match happened yesterday. Informal and short, G0 won by 6-2 (there are no draws in BT). Statistically this might not be significant, but trust me that I would bet money on G0 had this match gone on further.

So why is this relevant here? Because I think BT as a game is "draughts-like": a blocking game where you try to reach the last rank. My assumption is that making G0 (or any other A0-style program) learn draughts would lead to good results. Unfortunately, the author of G0 does not seem interested, yet. There are multiple reasons, and one of them is the difficulty to express the rules of draughts in the specific rule language that G0 is using. For completeness, it's called GDL, and maybe it's this one: https://en.m.wikipedia.org/wiki/Game_De ... n_Language

In any case, the message for anybody interested in the A0 approach is that it should work in all games, and apart from the hardest ones (Go, chess, Shogi) you don't need to be a millionaire to run the training part.

Fabien.

Rein Halbersma · Post by **Rein Halbersma** » Wed May 09, 2018 08:45

Fabien Letouzey wrote:Hi all,

Last month Richard Lorentz (http://www.csun.edu/~lorentz/), author of several game programs including one for Amazons, contacted me about the game of Breakthrough: https://en.m.wikipedia.org/wiki/Breakth ... oard_game)
This game, that I will call BT here, was the inspiration behind my (half-joking) proposal to apply its principle to draughts (draughts-BT).

Richard wanted to talk to me about a new program, "gzero" (G0), that played so well that it won all its games against strong opposition. As I understand, G0 is a clone of AlphaZero (A0) with a domain-specific language you use to describe your game. It is fully automated: you enter the rules of a game, and then you wait, and then a strong program emerges after an indeterminate period (depending on the game). You can find the source code here: https://github.com/ggplib/ggp-zero
Furthermore, G0 only spent a few days of learning (for BT), compared to many years for A0 (if they used a normal computer like those we can afford).

Anyway, G0 is so strong in BT that Richard couldn't even estimate its level. So he asked me to convert Scan to BT, a possibility I mentioned to him last year, and match the resulting program with G0. That match happened yesterday. Informal and short, G0 won by 6-2 (there are no draws in BT). Statistically this might not be significant, but trust me that I would bet money on G0 had this match gone on further.

So why is this relevant here? Because I think BT as a game is "draughts-like": a blocking game where you try to reach the last rank. My assumption is that making G0 (or any other A0-style program) learn draughts would lead to good results. Unfortunately, the author of G0 does not seem interested, yet. There are multiple reasons, and one of them is the difficulty to express the rules of draughts in the specific rule language that G0 is using. For completeness, it's called GDL, and maybe it's this one: https://en.m.wikipedia.org/wiki/Game_De ... n_Language

In any case, the message for anybody interested in the A0 approach is that it should work in all games, and apart from the hardest ones (Go, chess, Shogi) you don't need to be a millionaire to run the training part.

Fabien.

I guess most of us draughts programmers have barely transitioned from the Stone Age with hand-written and hand-tuned eval features, towards the Bronze Age with hand-written and machine-tuned eval features. The Iron Age of deep learning might take a while to be adopted

jj · Post by jj » Wed May 09, 2018 12:06

Fabien Letouzey wrote:...
In any case, the message for anybody interested in the A0 approach is that it should work in all games, and apart from the hardest ones (Go, chess, Shogi) you don't need to be a millionaire to run the training part.

Interesting. However, what sets draughts (& checkers) apart from the mentioned games are the two rules 1) capturing is compulsory and 2) capturing multiple pieces in one move. This facilitates deep combinations (shots) and forcings, which in my view poses a problem to the underlying MCTS-ish search algorithm.

Jan-Jaap

Fabien Letouzey · Post by **Fabien Letouzey** » Wed May 09, 2018 13:09

Rein Halbersma wrote:I guess most of us draughts programmers have barely transitioned from the Stone Age with hand-written and hand-tuned eval features, towards the Bronze Age with hand-written and machine-tuned eval features. The Iron Age of deep learning might take a while to be adopted

Actually I meant a positive message in my post. Let me try a more explicit version.

Because patterns (now called (N-)tuple (networks) by academics, apparently) were designed before the web became of widespread use, it is difficult to find information on them. I now know, because I looked for easy pointers that I could give to G0's author, but couldn't find any. The Buro papers are maths heavy, and I was looking for something more graphical. CNNs, by contrast, are very fashionable, and it is therefore easy to find blogs, articles, and even libraries that do most of the work.

So my hidden message was for people who are not already using patterns: they can switch to CNNs directly. Not because it is easier (if you do it yourself, it isn't) but because it is well documented and already implemented by others. Also, my post was targetting a wider audience than the current engine authors. Since A0 is completely different from engines as we know them, it seems a good time for newcomers to appear.

Fabien Letouzey · Post by **Fabien Letouzey** » Wed May 09, 2018 13:24

jj wrote:Interesting. However, what sets draughts (& checkers) apart from the mentioned games are the two rules 1) capturing is compulsory and 2) capturing multiple pieces in one move. This facilitates deep combinations (shots) and forcings, which in my view poses a problem to the underlying MCTS-ish search algorithm.

It is difficult to reason about this, because we don't know for sure the respective contributions of MCTS, the policy network, and the value network in the success of A0. Everybody seems to have a different theory. In Go it is easier to compare because the value network was the true innovation of AG.

I lack the experience to comment on the limitations of MCTS. It is known to be weak in tactics, but people are talking about chess when they say that. In chess, no matter what depth 'd' you are searching, there will often be new tactics at d + 1. In my experience, draughts tactics are much easier. I think we (Michel, maybe?) have already qualified them as "shallow" on this forum.

CNNs I know better, and am confident that they won't choke on the valid points that you are making. After all, they manage to guess the life/death status of Go groups, a fairly non-local concept. In the worst case, one will need more training time than for BT, but hopefully nowhere near the years/computer that chess/Shogi/Go seem to require.

Rein Halbersma · Post by **Rein Halbersma** » Wed May 09, 2018 16:51

jj wrote:
Fabien Letouzey wrote:...
In any case, the message for anybody interested in the A0 approach is that it should work in all games, and apart from the hardest ones (Go, chess, Shogi) you don't need to be a millionaire to run the training part.
Interesting. However, what sets draughts (& checkers) apart from the mentioned games are the two rules 1) capturing is compulsory and 2) capturing multiple pieces in one move. This facilitates deep combinations (shots) and forcings, which in my view poses a problem to the underlying MCTS-ish search algorithm.

Jan-Jaap

You might be right that it poses problems, but I think that tactical patterns can be learned to some extent. Eg Stef Keetman successfully used a set of tactical patterns in Truus already 25 years ago. And when I look for combinations in TurboDambase, I can find most of them using only static patterns. There are false positives of course, but a NN should be able to learn these patterns, and the MCTS could filter out the false positives. Furthermore, the AlphaZero algorithm trains the NN on both the actual game result as well as on the MCTS generated move probabilities.

Rein Halbersma · Post by **Rein Halbersma** » Wed May 09, 2018 16:57

Fabien Letouzey wrote:
Rein Halbersma wrote:I guess most of us draughts programmers have barely transitioned from the Stone Age with hand-written and hand-tuned eval features, towards the Bronze Age with hand-written and machine-tuned eval features. The Iron Age of deep learning might take a while to be adopted
Actually I meant a positive message in my post. Let me try a more explicit version.

Because patterns (now called (N-)tuple (networks) by academics, apparently) were designed before the web became of widespread use, it is difficult to find information on them. I now know, because I looked for easy pointers that I could give to G0's author, but couldn't find any. The Buro papers are maths heavy, and I was looking for something more graphical. CNNs, by contrast, are very fashionable, and it is therefore easy to find blogs, articles, and even libraries that do most of the work.

So my hidden message was for people who are not already using patterns: they can switch to CNNs directly. Not because it is easier (if you do it yourself, it isn't) but because it is well documented and already implemented by others. Also, my post was targetting a wider audience than the current engine authors. Since A0 is completely different from engines as we know them, it seems a good time for newcomers to appear.

I didn’t mean to imply any negativity, just trying to come up with a reason why your post yielded little reactions for a while.

Fabien Letouzey · Post by **Fabien Letouzey** » Wed May 09, 2018 17:34

Rein Halbersma wrote:I didn’t mean to imply any negativity, just trying to come up with a reason why your post yielded little reactions for a while.

I wasn't expecting immediate reaction. Additionally to being informative, I tried to reach people who don't post on the forum. And then, sooner or later (next year I guess), an A0 clone will pop up. I felt that without my post, the process would be slower.

jj · Post by jj » Thu May 10, 2018 13:18

jj wrote:Interesting. However, what sets draughts (& checkers) apart from the mentioned games are the two rules 1) capturing is compulsory and 2) capturing multiple pieces in one move. This facilitates deep combinations (shots) and forcings, which in my view poses a problem to the underlying MCTS-ish search algorithm.

Fabien Letouzey wrote:It is difficult to reason about this, because we don't know for sure the respective contributions of MCTS, the policy network, and the value network in the success of A0. Everybody seems to have a different theory. In Go it is easier to compare because the value network was the true innovation of AG.

I lack the experience to comment on the limitations of MCTS. It is known to be weak in tactics, but people are talking about chess when they say that. In chess, no matter what depth 'd' you are searching, there will often be new tactics at d + 1. In my experience, draughts tactics are much easier. I think we (Michel, maybe?) have already qualified them as "shallow" on this forum.

CNNs I know better, and am confident that they won't choke on the valid points that you are making. After all, they manage to guess the life/death status of Go groups, a fairly non-local concept. In the worst case, one will need more training time than for BT, but hopefully nowhere near the years/computer that chess/Shogi/Go seem to require.

Rein Halbersma wrote:You might be right that it poses problems, but I think that tactical patterns can be learned to some extent. Eg Stef Keetman successfully used a set of tactical patterns in Truus already 25 years ago. And when I look for combinations in TurboDambase, I can find most of them using only static patterns. There are false positives of course, but a NN should be able to learn these patterns, and the MCTS could filter out the false positives. Furthermore, the AlphaZero algorithm trains the NN on both the actual game result as well as on the MCTS generated move probabilities.

I wouldn't call the tactics that grandmasters and brute force programs are capable of shallow, but everything is relative.

Two years ago I made an MCTS connect-4 program that plays almost perfectly, just with random rollouts. Then I applied the algorithm to draughts, adding a 6-piece database. To my surprise, I lost more games to this program than I could win (5 sec/move; I used more time). I am a former club player but not strong at tactics. Maximus, however, thrashed the MCTS program, often quickly winning material (all wins but about one draw in 50 games).

Of course this is without value and policy networks. I don't have experience with CNNs and it would be very interesting to see what they are capable of in draughts. They should, indeed, be much more powerful than the tactical patterns of Keetman. On the other hand, the Keetman patterns guide the tactical search and it is not clear to me how an A0 clone would manage this. But then again, Keetman patterns are only applied in leaf nodes and not in interior nodes, IIRC. So it is difficult to compare the techniques.

Fabien, are you using CNNs for chess now?

Fabien Letouzey · Post by **Fabien Letouzey** » Thu May 10, 2018 17:53

jj wrote:Two years ago I made an MCTS connect-4 program that plays almost perfectly, just with random rollouts. Then I applied the algorithm to draughts, adding a 6-piece database. To my surprise, I lost more games to this program than I could win (5 sec/move; I used more time). I am a former club player but not strong at tactics. Maximus, however, thrashed the MCTS program, often quickly winning material (all wins but about one draw in 50 games).

The problem is that random rollouts are terrible for tactics. They favour forcing moves where the opponent has only one answer, but pure randomness will play it only with probability 1/n. Any kind of bias dramatically improves the situation, although it's an art (and a very secretive one) to make it work best. In Go, for instance, they used tricks for strings in atari very early on. Something like P = 1/2 for the capture/save instead of 1/n. Don't get me wrong: I love the idea of pure randomness giving us information; it's very elegant. But it's also terribly inefficient.

Of course this is without value and policy networks. I don't have experience with CNNs and it would be very interesting to see what they are capable of in draughts. They should, indeed, be much more powerful than the tactical patterns of Keetman. On the other hand, the Keetman patterns guide the tactical search and it is not clear to me how an A0 clone would manage this. But then again, Keetman patterns are only applied in leaf nodes and not in interior nodes, IIRC. So it is difficult to compare the techniques.

The policy network does guide the search ("progressive bias"). In AB jargon, it would play multiple roles:
- move ordering (order of exploration in MCTS)
- extension/reduction for each move
- root "bias" added to the search score (rarely used in AB programs)
- maybe others I'm not thinking of?

This could be extremely powerful. And it has to be, because it replaces all the search heuristics AG had, including RAVE (or AMAF) which is very efficient in Go.

Fabien, are you using CNNs for chess now?

No, my experience comes from working With Rémi Coulom on Crazy Stone two years ago.

Even if I wanted to use them in chess, it wouldn't be realistic because of the huge computing-power requirement. Leela-Chess Zero (LC0) needs hundreds of people donating computer time for months before becoming competitive.

That's why I made this post on G0: it gives hope that, at least in simpler games, you don't need to find a sponsor before even starting your project.

Fabien.

jj · Post by jj » Mon May 14, 2018 15:27

Fabien Letouzey wrote:
Fabien, are you using CNNs for chess now?
No, my experience comes from working With Rémi Coulom on Crazy Stone two years ago.

Even if I wanted to use them in chess, it wouldn't be realistic because of the huge computing-power requirement. Leela-Chess Zero (LC0) needs hundreds of people donating computer time for months before becoming competitive.

That's why I made this post on G0: it gives hope that, at least in simpler games, you don't need to find a sponsor before even starting your project.

Fabien.

Thanks for sharing. Another thing worth looking into. The question is how simple draughts is, with different rules and a game-tree complexity of about 10^100 versus chess 10^120. In the mean time I'll start saving for a quantum computer.

Jan-Jaap

World Draughts Forum

AlphaZero-style Breakthrough program

AlphaZero-style Breakthrough program

Re: AlphaZero-style Breakthrough program

Re: AlphaZero-style Breakthrough program

Re: AlphaZero-style Breakthrough program

Re: AlphaZero-style Breakthrough program

Re: AlphaZero-style Breakthrough program

Re: AlphaZero-style Breakthrough program

Re: AlphaZero-style Breakthrough program

Re: AlphaZero-style Breakthrough program

Re: AlphaZero-style Breakthrough program

Re: AlphaZero-style Breakthrough program