NNUE
-
- Posts: 1368
- Joined: Thu Jun 20, 2013 17:16
- Real name: Krzysztof Grzelak
Re: NNUE
I think that for now I will not buy anything - wait for a favorable opportunity.
-
- Posts: 1368
- Joined: Thu Jun 20, 2013 17:16
- Real name: Krzysztof Grzelak
Re: NNUE
Sorry to ask Bert. Like GUI Damage 2021.
Re: NNUE
Krzysztof,
i have solved one bug in the save pdn (not sure there are more).
If there are other bugs, just let me know.
Also the language issue should be solved.
See here the link to all files.
https://www.dropbox.com/sh/8ewsr0ggesxj ... Egtja?dl=0
Basically you only need to replace on your side, the Damage2021.exe and the french language file (DamageFRA.lng).
The pondering request will take a little more time.
Bert
i have solved one bug in the save pdn (not sure there are more).
If there are other bugs, just let me know.
Also the language issue should be solved.
See here the link to all files.
https://www.dropbox.com/sh/8ewsr0ggesxj ... Egtja?dl=0
Basically you only need to replace on your side, the Damage2021.exe and the french language file (DamageFRA.lng).
The pondering request will take a little more time.
Bert
-
- Posts: 1368
- Joined: Thu Jun 20, 2013 17:16
- Real name: Krzysztof Grzelak
Re: NNUE
At the very beginning, he thanks Bert. Whether in this gray bar can be an engine rating. As was the case in Damage 15.3. I get an error like the picture below. I apologize that I ask if you can do something about it. As for the save of the game and the language - everything is fine. Please think about the option "Pondering".
- Attachments
-
- Error.jpg (11.06 KiB) Viewed 9324 times
Re: NNUE
Krzystof, question when exactly do you get the Error:Illegal Engine Move Message!
If you can explain me the specific situation , i might be able to see the same here, and start solving it.
I will think about pondering, and put some info in the grey box....
Bert
If you can explain me the specific situation , i might be able to see the same here, and start solving it.
I will think about pondering, and put some info in the grey box....
Bert
-
- Posts: 1368
- Joined: Thu Jun 20, 2013 17:16
- Real name: Krzysztof Grzelak
Re: NNUE
It will describe this error. I set in the program Player - Engine. I'm making a move on the board. The engine now thinks about the traffic and calculates variants. The engine now thought over the movement and calculates variants (at this point I am making a black move behind the engine). After this move, the engine moves as white and the error I wrote about arises.
Re: NNUE
Krzysztof,
i understand the situation.
Basically when the engine thinks, you should not be able to move the pieces (in your case the black pieces).
So what is your preference?
I can change the GUI that the input during engine thinking/search is not possible, or that input is possible, in which case the engine search is then stopped for that move.
Bert
i understand the situation.
Basically when the engine thinks, you should not be able to move the pieces (in your case the black pieces).
So what is your preference?
I can change the GUI that the input during engine thinking/search is not possible, or that input is possible, in which case the engine search is then stopped for that move.
Bert
-
- Posts: 1368
- Joined: Thu Jun 20, 2013 17:16
- Real name: Krzysztof Grzelak
Re: NNUE
Solve it somehow sensibly. So that it would not be possible to do such things through white and black. I'm setting up a program Player - Engine. I made a movement white. Black has the movement - the engine. If I make a move for black now - it should be immediately withdrawn. In the end, the engine can only make a move black - because he plays black. Make such a blockage. Of course, it applies to both white and black.
-
- Posts: 471
- Joined: Wed May 04, 2016 11:45
- Real name: Joost Buijs
Re: NNUE
Now something about NNUE.
My latest Draughts network (still 32 bit floating point) performs roughly 11 Elo below the level of Kingsrow 1.62. The biggest problem is the large speed difference of 1.3 Mn/s vs 18 Mn/s. It could be that KR counts nodes differently (18 Mn/s seems unrealistically high for a single core), even when the speed difference is something like a factor of 10 this has impact on performance, even with a drawish game like Draughts.
The network is very similar to the one Bert uses: 190:256x32x32x1, instead of having an extra input for the side to move I flip the board when it is blacks turn. Because I want to keep multiples of 32 for future SIMD code the network actually has 192 inputs with 2 unused. The network is trained with a mix of 834M positions labeled with game results and the evaluation score of a 4 ply full-width search. A set of 8.6M positions is used for validation.
Because it is easier for experimentation I kept the implementation in floating point. The lower speed also helps to exaggerate differences in playing strength. This week I reached the point that I want to start working on implementing int16 or int8 SIMD code with incremental update of the accumulator stack. As Bert already showed with Scan-NNUE, int16 SIMD code for inference runs approximately 3 to 4 times faster than my unoptimized float32 implementation. I have no idea what this will do for the level of play, for Chess this would make a difference of 90 to 120 Elo, I suppose for Draughts it will 10 times less.
First I want to try to quantize the network as it is, but I'm afraid that it will degrade performance. If it really gets much worse I have to retrain the network with what Facebook calls 'quantization aware training'. The trainer I use is written in C++ (to avoid Python which is slow like molasses) and makes use of libTorch 1.8.1 and CUDA 11.1.
The search needs to be reworked too, at the moment it is very basic, PVS with aspiration, a transposition table, a single killer and history for move ordering. For pruning it uses futility and LMR. There are still many things that can be improved, better time-control, better move ordering (maybe by a policy network), adding probcut and singular extensions to name a few.
I'm not a Draughts player but I have the impression that the network compared to pattern evaluation often plays differently, not that it is any better but there seems to be more than one road leading to Rome.
Attached are the results of the latest test match, 90 moves in 1 minute, Kingsrow with 6P EGDB.
My latest Draughts network (still 32 bit floating point) performs roughly 11 Elo below the level of Kingsrow 1.62. The biggest problem is the large speed difference of 1.3 Mn/s vs 18 Mn/s. It could be that KR counts nodes differently (18 Mn/s seems unrealistically high for a single core), even when the speed difference is something like a factor of 10 this has impact on performance, even with a drawish game like Draughts.
The network is very similar to the one Bert uses: 190:256x32x32x1, instead of having an extra input for the side to move I flip the board when it is blacks turn. Because I want to keep multiples of 32 for future SIMD code the network actually has 192 inputs with 2 unused. The network is trained with a mix of 834M positions labeled with game results and the evaluation score of a 4 ply full-width search. A set of 8.6M positions is used for validation.
Because it is easier for experimentation I kept the implementation in floating point. The lower speed also helps to exaggerate differences in playing strength. This week I reached the point that I want to start working on implementing int16 or int8 SIMD code with incremental update of the accumulator stack. As Bert already showed with Scan-NNUE, int16 SIMD code for inference runs approximately 3 to 4 times faster than my unoptimized float32 implementation. I have no idea what this will do for the level of play, for Chess this would make a difference of 90 to 120 Elo, I suppose for Draughts it will 10 times less.
First I want to try to quantize the network as it is, but I'm afraid that it will degrade performance. If it really gets much worse I have to retrain the network with what Facebook calls 'quantization aware training'. The trainer I use is written in C++ (to avoid Python which is slow like molasses) and makes use of libTorch 1.8.1 and CUDA 11.1.
The search needs to be reworked too, at the moment it is very basic, PVS with aspiration, a transposition table, a single killer and history for move ordering. For pruning it uses futility and LMR. There are still many things that can be improved, better time-control, better move ordering (maybe by a policy network), adding probcut and singular extensions to name a few.
I'm not a Draughts player but I have the impression that the network compared to pattern evaluation often plays differently, not that it is any better but there seems to be more than one road leading to Rome.
Attached are the results of the latest test match, 90 moves in 1 minute, Kingsrow with 6P EGDB.
- Attachments
-
- dxpgames.pdn
- Latest match results
- (149.2 KiB) Downloaded 344 times
Re: NNUE
Hi Joost,
I downloaded the pdn and 5 lost in 1 minute it's already very good.
Ares 1.2 it's on good way like Damage and Scan NNUE.
I want to know something, this question it's for you all, i will be glad to have your different ideas: NNUE it's based on a file that contain all the win/draw/lost position, i thought that like Alpha Zero this process was based on just tell the rules of the games to the program and it make the .NNUE'S file by self learning, playing against himself and then filtering by the programmer to make it strong.
Friendly, Sidiki.
I downloaded the pdn and 5 lost in 1 minute it's already very good.
Ares 1.2 it's on good way like Damage and Scan NNUE.
I want to know something, this question it's for you all, i will be glad to have your different ideas: NNUE it's based on a file that contain all the win/draw/lost position, i thought that like Alpha Zero this process was based on just tell the rules of the games to the program and it make the .NNUE'S file by self learning, playing against himself and then filtering by the programmer to make it strong.
Friendly, Sidiki.
-
- Posts: 471
- Joined: Wed May 04, 2016 11:45
- Real name: Joost Buijs
Re: NNUE
Hi Sidiki,
This is not so easy to answer because there are many ways in which you can train a network.
Like Alpha Zero you can start with a random network and have the engine play a number of games against itself and update the network on the fly depending upon the outcome of the games. This is called 'reinforcement learning'. The problem with this method is that you have to play a huge number of games before the network reaches an acceptable level of play. I never read the Alpha Zero paper, but I think for chess they used something like 40 million games.
Another method is what I use for my chess engine, train the network on the outcome of a number of games played by other strong engines (or people). This is called 'supervised learning'. For chess this is the easiest method because you can download millions of games played between strong engines everywhere. This doesn't mean that a network trained in this way will get as good as a network trained by reinforcement learning, but it is an easy method to start with.
For the Draugths network I use a method that is somewhat different. The engine played 8.3M games against itself with a material only evaluation function and random move ordering, this took a lot of time because these games had to be played at a level high enough not to miss tactics. From these games I took ~835M quiet positions and labeled them with the outcome of the games (0, 0.5, 1.0). I used these positions to train the initial network. After this I relabeled the positions with a 4 ply full-width search that uses the initial network as evaluation function and retrained the network with the relabeled positions. Somehow this works, but don't ask me why. I still want to try what happens when I relabel the positions with a 6 or 8 ply full-width search, since the network is still slow and a full-width search takes a lot of time, this will take ages, even on my 32 core computer. Maybe it is not necessary to use that many positions, Bert uses like 98M and as it seems he gets comparable results.
I have the feeling that there is still room for improvement, I tried different network sizes, using a first layer with 384 neurons instead of 256 clearly improved the network but it got almost 50% slower, with net result no gain. Dividing the first layer into 2 halves, one for the white half and one for the black half of the board could help, this is still something I want to try.
Like you requested by mail I can give you a copy of the engine, but I first want to improve the speed by quantizing the network. You also need a computer with fast AVX2 for it, otherwise it wont work. At the moment the engine only works as DXP server, it has no other means of communication.
Joost
This is not so easy to answer because there are many ways in which you can train a network.
Like Alpha Zero you can start with a random network and have the engine play a number of games against itself and update the network on the fly depending upon the outcome of the games. This is called 'reinforcement learning'. The problem with this method is that you have to play a huge number of games before the network reaches an acceptable level of play. I never read the Alpha Zero paper, but I think for chess they used something like 40 million games.
Another method is what I use for my chess engine, train the network on the outcome of a number of games played by other strong engines (or people). This is called 'supervised learning'. For chess this is the easiest method because you can download millions of games played between strong engines everywhere. This doesn't mean that a network trained in this way will get as good as a network trained by reinforcement learning, but it is an easy method to start with.
For the Draugths network I use a method that is somewhat different. The engine played 8.3M games against itself with a material only evaluation function and random move ordering, this took a lot of time because these games had to be played at a level high enough not to miss tactics. From these games I took ~835M quiet positions and labeled them with the outcome of the games (0, 0.5, 1.0). I used these positions to train the initial network. After this I relabeled the positions with a 4 ply full-width search that uses the initial network as evaluation function and retrained the network with the relabeled positions. Somehow this works, but don't ask me why. I still want to try what happens when I relabel the positions with a 6 or 8 ply full-width search, since the network is still slow and a full-width search takes a lot of time, this will take ages, even on my 32 core computer. Maybe it is not necessary to use that many positions, Bert uses like 98M and as it seems he gets comparable results.
I have the feeling that there is still room for improvement, I tried different network sizes, using a first layer with 384 neurons instead of 256 clearly improved the network but it got almost 50% slower, with net result no gain. Dividing the first layer into 2 halves, one for the white half and one for the black half of the board could help, this is still something I want to try.
Like you requested by mail I can give you a copy of the engine, but I first want to improve the speed by quantizing the network. You also need a computer with fast AVX2 for it, otherwise it wont work. At the moment the engine only works as DXP server, it has no other means of communication.
Joost
Re: NNUE
Hello,
For training NNUE in chess, Stockfish folks use positions labeled by shallow search, originally 1 billions positions of search of 8 plies. This is good because you can throw semi-random games to it, because with win/draw/loss a single tactical miss can give wrong results. The rough idea is that your evaluation gets closer to some engine + search. It will never reach it and it will be slower, but the net gain was about +100 elo for chess. There are various experiments if deeper search helps and/or more positions are better and/or if only quiet positions are better.
I had success training breakthrough and english checkers bots (their rank is high on codingame platform) from scratch with NNUE style net and MCTS search (only value, not policy). The training resembles alpha-zero training as it does self-play games and learns from them. Since neural network is relatively small, training from scratch takes only few hours to achieve current results. The NN is fast as only few squares are affected during move. Though I incorporated some knowledge into NN, i.e. in breakthrough pawn that is attacked is something different than non-attacked pawn; in checkers empty square that is attacked is different than simple empty square; and last but not least - double the inputs for the side to move, so whoever is the current player, the network inputs are completely different. Still almost as fast, but in zugzwangish games like breakthrough it works wonders.
For training NNUE in chess, Stockfish folks use positions labeled by shallow search, originally 1 billions positions of search of 8 plies. This is good because you can throw semi-random games to it, because with win/draw/loss a single tactical miss can give wrong results. The rough idea is that your evaluation gets closer to some engine + search. It will never reach it and it will be slower, but the net gain was about +100 elo for chess. There are various experiments if deeper search helps and/or more positions are better and/or if only quiet positions are better.
I had success training breakthrough and english checkers bots (their rank is high on codingame platform) from scratch with NNUE style net and MCTS search (only value, not policy). The training resembles alpha-zero training as it does self-play games and learns from them. Since neural network is relatively small, training from scratch takes only few hours to achieve current results. The NN is fast as only few squares are affected during move. Though I incorporated some knowledge into NN, i.e. in breakthrough pawn that is attacked is something different than non-attacked pawn; in checkers empty square that is attacked is different than simple empty square; and last but not least - double the inputs for the side to move, so whoever is the current player, the network inputs are completely different. Still almost as fast, but in zugzwangish games like breakthrough it works wonders.
-
- Posts: 471
- Joined: Wed May 04, 2016 11:45
- Real name: Joost Buijs
Re: NNUE
Hi,
Using a completely different set of input weights for white/black to move like they do in Stockfish and Shogi engines is something I haven't tried yet.
Flipping the board when it is blacks turn to move or using a different set of input weights is different indeed. This is caused by the fact that with Draughts there are almost no transpositions because disks can only move forward. Most of the time positions with white to move differ from flipped positions with black to move.
Using a different set of input weights for white/black to move effectively doubles the network size without it costing much extra time for inference and training. This is something I already have on my to do list.
Sometimes I look at the Stockfish Discord too, the people over there are also struggling to get better networks, the best Stockfish network is already many months old. It all boils down to trial and error, there is nobody who actually knows what is going on below the surface.
Using a completely different set of input weights for white/black to move like they do in Stockfish and Shogi engines is something I haven't tried yet.
Flipping the board when it is blacks turn to move or using a different set of input weights is different indeed. This is caused by the fact that with Draughts there are almost no transpositions because disks can only move forward. Most of the time positions with white to move differ from flipped positions with black to move.
Using a different set of input weights for white/black to move effectively doubles the network size without it costing much extra time for inference and training. This is something I already have on my to do list.
Sometimes I look at the Stockfish Discord too, the people over there are also struggling to get better networks, the best Stockfish network is already many months old. It all boils down to trial and error, there is nobody who actually knows what is going on below the surface.
Re: NNUE
Hi Joost,Joost Buijs wrote: ↑Fri Apr 30, 2021 08:14Hi Sidiki,
This is not so easy to answer because there are many ways in which you can train a network.
Like Alpha Zero you can start with a random network and have the engine play a number of games against itself and update the network on the fly depending upon the outcome of the games. This is called 'reinforcement learning'. The problem with this method is that you have to play a huge number of games before the network reaches an acceptable level of play. I never read the Alpha Zero paper, but I think for chess they used something like 40 million games.
Another method is what I use for my chess engine, train the network on the outcome of a number of games played by other strong engines (or people). This is called 'supervised learning'. For chess this is the easiest method because you can download millions of games played between strong engines everywhere. This doesn't mean that a network trained in this way will get as good as a network trained by reinforcement learning, but it is an easy method to start with.
For the Draugths network I use a method that is somewhat different. The engine played 8.3M games against itself with a material only evaluation function and random move ordering, this took a lot of time because these games had to be played at a level high enough not to miss tactics. From these games I took ~835M quiet positions and labeled them with the outcome of the games (0, 0.5, 1.0). I used these positions to train the initial network. After this I relabeled the positions with a 4 ply full-width search that uses the initial network as evaluation function and retrained the network with the relabeled positions. Somehow this works, but don't ask me why. I still want to try what happens when I relabel the positions with a 6 or 8 ply full-width search, since the network is still slow and a full-width search takes a lot of time, this will take ages, even on my 32 core computer. Maybe it is not necessary to use that many positions, Bert uses like 98M and as it seems he gets comparable results.
I have the feeling that there is still room for improvement, I tried different network sizes, using a first layer with 384 neurons instead of 256 clearly improved the network but it got almost 50% slower, with net result no gain. Dividing the first layer into 2 halves, one for the white half and one for the black half of the board could help, this is still something I want to try.
Like you requested by mail I can give you a copy of the engine, but I first want to improve the speed by quantizing the network. You also need a computer with fast AVX2 for it, otherwise it wont work. At the moment the engine only works as DXP server, it has no other means of communication.
Joost
Thanks for taking time to answer to this question.
All it's clear now in my mind.
Thanks also to the others for they answers.
I will wait for the Ares 1.2.
Learning from many engines, from a draughts or chess player, permit to have a huge knowledge and playing style to adapt it on a game.
To say that sometimes, depending of the position, i play as Kingsrow, Scan,Damage, Mobydam, Truus, even Flits.
Thanks for the world of NNUE.
Friendly, Sidiki.
-
- Posts: 1722
- Joined: Wed Apr 14, 2004 16:04
- Contact:
Re: NNUE
Reinforcement learning should be strictly superior compared to supervised learning, even for NNUE or Scan-pattern draughts programs. Mathematically, supervised learning is just a single iteration in the RL loop, so if you stop after one round, they are equivalent. If you continue the loop and only pick a new network when it improves, you can never get worse.Joost Buijs wrote: ↑Fri Apr 30, 2021 08:14Like Alpha Zero you can start with a random network and have the engine play a number of games against itself and update the network on the fly depending upon the outcome of the games. This is called 'reinforcement learning'. The problem with this method is that you have to play a huge number of games before the network reaches an acceptable level of play. I never read the Alpha Zero paper, but I think for chess they used something like 40 million games.
The question is whether you will gain much from continuous playing and retraining. For Scan-based patterns, I highly doubt this. The eval is almost completely optimized as soon as you have ~100M positions and cannot be made to overfit after that in my experience. For NNUE, there might be more capacity for overfitting that you might then try to reduce by adding more positions and retraining.
The AlphaZero neural networks are a few orders of magnitude more expensive and require much more positions and data to reach the limit of their predictive power. That's why it was so expensive to train and generate all these games (~1700 years in terms of single PC years). IIRC, the training games were with Monte Carlo Tree search with only 1600 nodes per search, that's just a tiny amount of search and a huge amount of CPU/GPU cycles to the eval. The AlphaZero eval also picks up a big part of pattern-based tactics because it's such a large neural network. For Scan-patterns, it's the reverse: a huge amount of search and a tiny amount of eval cycles.