I agree.Rein Halbersma wrote: ↑Sat May 01, 2021 11:06Reinforcement learning should be strictly superior compared to supervised learning, even for NNUE or Scan-pattern draughts programs. Mathematically, supervised learning is just a single iteration in the RL loop, so if you stop after one round, they are equivalent. If you continue the loop and only pick a new network when it improves, you can never get worse.Joost Buijs wrote: ↑Fri Apr 30, 2021 08:14Like Alpha Zero you can start with a random network and have the engine play a number of games against itself and update the network on the fly depending upon the outcome of the games. This is called 'reinforcement learning'. The problem with this method is that you have to play a huge number of games before the network reaches an acceptable level of play. I never read the Alpha Zero paper, but I think for chess they used something like 40 million games.
The question is whether you will gain much from continuous playing and retraining. For Scan-based patterns, I highly doubt this. The eval is almost completely optimized as soon as you have ~100M positions and cannot be made to overfit after that in my experience. For NNUE, there might be more capacity for overfitting that you might then try to reduce by adding more positions and retraining.
The AlphaZero neural networks are a few orders of magnitude more expensive and require much more positions and data to reach the limit of their predictive power. That's why it was so expensive to train and generate all these games (~1700 years in terms of single PC years). IIRC, the training games were with Monte Carlo Tree search with only 1600 nodes per search, that's just a tiny amount of search and a huge amount of CPU/GPU cycles to the eval. The AlphaZero eval also picks up a big part of pattern-based tactics because it's such a large neural network. For Scan-patterns, it's the reverse: a huge amount of search and a tiny amount of eval cycles.
It's a pity that it is not worthwhile trying to train larger/deeper networks for 10x10 Draughts because 12 bit Scan-based patterns already seem to cover 99% of the important features. In fact NNUE for Draughts is a step backwards because it is slower, it's just the fun of trying to get it perform at the same level.
For training the huge draw tendency of Draughts is a problem too. Above a certain level engines don't lose anymore, even when you give the opponent a 100 fold speed advantage. From draws the network won't learn anything. It looks like that on current hardware 10x10 Draughts is getting a bit trivial. Maybe it needs other rules like removing compulsory captures, this could make the game more complex.