
NNUE
- 
				Krzysztof Grzelak
- Posts: 1368
- Joined: Thu Jun 20, 2013 17:16
- Real name: Krzysztof Grzelak
Re: NNUE
The frenzy is becoming with it for everyone.
			
			
									
						
										
						Re: NNUE
Rein, thanks for your post.
I think that Jaap-Jaap for some time in this forum mentioned the unfair advantage by using (amongst others) optimization tools developed by others.
Do you believe that with TensorFlow/Keras, basically the optimization framework is available for all?
One still needs some Python background, but in my case it was extremely simple, to calculate the neural network weights for a 10x10 NN, based on my .pdn match games.
But, in the end the proof of the pudding is in the eating...
Bert
			
			
									
						
										
						I think that Jaap-Jaap for some time in this forum mentioned the unfair advantage by using (amongst others) optimization tools developed by others.
Do you believe that with TensorFlow/Keras, basically the optimization framework is available for all?
One still needs some Python background, but in my case it was extremely simple, to calculate the neural network weights for a 10x10 NN, based on my .pdn match games.
But, in the end the proof of the pudding is in the eating...
Bert
- 
				Rein Halbersma
- Posts: 1723
- Joined: Wed Apr 14, 2004 16:04
- Contact:
Re: NNUE
Yes.BertTuyt wrote: Mon Nov 30, 2020 13:52 Rein, thanks for your post.
I think that Jaap-Jaap for some time in this forum mentioned the unfair advantage by using (amongst others) optimization tools developed by others.
Do you believe that with TensorFlow/Keras, basically the optimization framework is available for all?
I took me 10 days and about 100 emails exchange with Ed to get all the details correct. Once it's polished a bit, I open source the code, and that non-draughts related part of the puzzle will be 100% accessible to everyone.
TensorFlow is a hugely complicated black-box optimizer, there are over a dozen parameters you can reasonably tweak. You have to really know what those parameters do. You are now a captain on a nuclear powered aircraft carrier. Handle it with careOne still needs some Python background, but in my case it was extremely simple, to calculate the neural network weights for a 10x10 NN, based on my .pdn match games.
But, in the end the proof of the pudding is in the eating...
Bert
 In my experience, this can have a great effect on the quality of the trained weights. E.g. try setting "decay_rate=0.80" instead of 0.96 in Jonathan's script and you gain a free ~20 Elo (at least in checkers it had that effect).
 In my experience, this can have a great effect on the quality of the trained weights. E.g. try setting "decay_rate=0.80" instead of 0.96 in Jonathan's script and you gain a free ~20 Elo (at least in checkers it had that effect).There is still plenty of things to compete on. Generating the training positions e.g., there are all kinds of parameters to tweak there as well (the Stockfish/Shogi folks seems to use fixed-depth rather than fixed-time matches for it, to name just one difference).
- 
				Rein Halbersma
- Posts: 1723
- Joined: Wed Apr 14, 2004 16:04
- Contact:
Re: NNUE
Bert, please be careful with the licensing conditions on Jonathan's code! It might have some unforeseen consequences for derivative products that you should make sure you understand (I don't and my code is written from scratch and will be Boost Licensed).BertTuyt wrote: Mon Nov 30, 2020 11:52 In the meantime I also started experimenting with NNUE and the checkers-code.
So far I was able to implement the checkers NNUE source-code in my Engine (Damage 15.3).
I used an input vector with 191 elements (2 * 45 white/black man, 2 * 50 white/black king and side to move).
I also succeeded to generate a weights-file with the Python script and TensorFlow, based upon my large set of games.
As usual I have introduced some bugs, but expect in 1 - 2 weeks to have all working.
https://github.com/jonkr2/GuiNN_Checker ... in/LICENSE
https://creativecommons.org/licenses/by-nc-sa/3.0/
Re: NNUE
Rein, thanks for your post.
I just implemented the code, as is, to create a proof of concept, based upon the fact that it apparently worked for the checkers case.
If it yield interesting results, I will write my own NN, most likely less flexible, but sufficient for the purpose.
To check the python results, I also wrote a very straightforward NN in C, without the SIMD instructions, see below.
At least it gave the same results.
Bert
			
			
									
						
										
						I just implemented the code, as is, to create a proof of concept, based upon the fact that it apparently worked for the checkers case.
If it yield interesting results, I will write my own NN, most likely less flexible, but sufficient for the purpose.
To check the python results, I also wrote a very straightforward NN in C, without the SIMD instructions, see below.
At least it gave the same results.
Code: Select all
void layer(int ixinput, int ixneuron, float* input, float* output, float* weights, int activation)
{
	int ioffset;
	int ioffset0 = ixinput * ixneuron;
	float fsum;
	for (int i = 0; i < ixneuron; i++)
	{
		fsum = 0;
		for (int j = 0; j < ixinput; j++) {
			ioffset = j * ixneuron;
			fsum += (weights[ioffset + i] * input[j]);
		}
		fsum += weights[ioffset0 + i]; // add bias
		if (activation == activation_relu) // activation relu
			output[i] = std::fmax(0, fsum);
		if (activation == activation_sigmoid) // activation sigmoid
			output[i] = 1 / (1 + exp(-fsum));
	}
}
float model(float* finput, float* foutput0, float* foutput1, float* foutput2, float* fweight)
{
	float fmodeloutput;
	layer(121, 192, finput, foutput0, &fweight[0], activation_relu);
	layer(192, 32, foutput0, foutput1, &fweight[23424], activation_relu);
	layer(32, 32, foutput1, foutput2, &fweight[29600], activation_relu);
	layer(32, 1, foutput2, &fmodeloutput, &fweight[30656], activation_sigmoid);
	return fmodeloutput;
}
- 
				Madeleine Birchfield
- Posts: 12
- Joined: Mon Jun 22, 2020 12:36
- Real name: Madeleine Birchfield
Re: NNUE
The AlphaZero like networks are residual neural networks in addition to convolutional neural networks, and not only does eval, but also move ordering.Rein Halbersma wrote: Fri Nov 20, 2020 22:02 In the spectrum of eval complexity, one could make roughly the following hierarchy:
- Patterns: Pioneered by Fabien's Scan, strongest programs now for 8x8 checkers and 10x10 draughts. Input = K indices ranged 1..3^N for patterns of N squares, only K valid for every position. Fast index computations (PEXT) + direct lookup of K weights. No layers on top (sigmoid for training).
- Raw board: Pioneered in backgammon in the 1990s, now by Jonathan's GuiNN_checkers. Slightly stronger than Cake, still weaker than Kingsrow for 8x8 checkers. Input = all Piece entries (both type and square). 3 fully connected layers on top. Requires Python float weights -> C++ int conversion + SIMD programming + incremental updates (not yet done by Jonathan) to be fast during game play.
- NNUE: Pioneered in Shogi programs, now in Stockfish, currently strongest programs for chess, Shogi. Input = all King (square) * Piece (both type and square) entries. 3 fully connected layers on top. Same C++ machinery as for the above entry required (all implemented in Shogi and Stockfish).
- CNN: Pioneered by AlphaZero, currently strongest for Go, formerly for chess, Shogi. No successful attempts for checkers/draughts AFAIK. Input = all Piece (both type and square) entries, but the expensive comes from 3x3 convolutions in 40-80 layers deep.
- 
				Madeleine Birchfield
- Posts: 12
- Joined: Mon Jun 22, 2020 12:36
- Real name: Madeleine Birchfield
Re: NNUE
I think somebody in the computer chess world is experimenting with 'PN' networks, but he calls them 'adjacent-piece-piece' networks if I remember correctly.Rein Halbersma wrote: Fri Nov 20, 2020 22:02 A slightly cheaper version might be called "PN" networks: all Piece (both type and square) * Neighbor (both type and square) entries. So only the 4 neighboring squares get computed. This is only slightly more expensive than the "P" type networks, yet might offer a flexible form of Scan-like patterns (speculative!).
- 
				Rein Halbersma
- Posts: 1723
- Joined: Wed Apr 14, 2004 16:04
- Contact:
Re: NNUE
Thanks, good points!Madeleine Birchfield wrote: Wed Dec 02, 2020 22:13 The AlphaZero like networks are residual neural networks in addition to convolutional neural networks, and not only does eval, but also move ordering.
- 
				Rein Halbersma
- Posts: 1723
- Joined: Wed Apr 14, 2004 16:04
- Contact:
Re: NNUE
Yes, I just checked on talkchesss: connor_mcmonigle mentioned this here: http://talkchess.com/forum3/viewtopic.p ... ce#p872339 I hadn’t seen that yet. Funny, it’s dated nov 12, the same day Ed alerted me to Jonathan’s talkchess update on Gui-NN.Madeleine Birchfield wrote: Wed Dec 02, 2020 22:25I think somebody in the computer chess world is experimenting with 'PN' networks, but he calls them 'adjacent-piece-piece' networks if I remember correctly.Rein Halbersma wrote: Fri Nov 20, 2020 22:02 A slightly cheaper version might be called "PN" networks: all Piece (both type and square) * Neighbor (both type and square) entries. So only the 4 neighboring squares get computed. This is only slightly more expensive than the "P" type networks, yet might offer a flexible form of Scan-like patterns (speculative!).
Btw, the PN idea came up in discussions with Fabien and we concluded it would be like a local convolution without weight sharing. It turns out you can implement it directly in Keras with https://keras.io/api/layers/locally_con ... nnected2d/ I haven’t found such a tool for PyTorch yet, to further clarify my answer to your question in another thread

- 
				Madeleine Birchfield
- Posts: 12
- Joined: Mon Jun 22, 2020 12:36
- Real name: Madeleine Birchfield
Re: NNUE
I should also add that NNUE could have any number of fully connected layers on top. There is an chess engine called Halogen that uses an NNUE with only two fully connected layers on top, and another engine called Seer that uses a NNUE with four connected layers on top.Rein Halbersma wrote: Fri Nov 20, 2020 22:02
- NNUE: Pioneered in Shogi programs, now in Stockfish, currently strongest programs for chess, Shogi. Input = all King (square) * Piece (both type and square) entries. 3 fully connected layers on top. Same C++ machinery as for the above entry required (all implemented in Shogi and Stockfish).
- 
				Rein Halbersma
- Posts: 1723
- Joined: Wed Apr 14, 2004 16:04
- Contact:
Re: NNUE
Yes, the possible variations are endless. I only tried to categorize by what are IMO essential differences.Madeleine Birchfield wrote: Thu Dec 03, 2020 00:54 I should also add that NNUE could have any number of fully connected layers on top. There is an chess engine called Halogen that uses an NNUE with only two fully connected layers on top, and another engine called Seer that uses a NNUE with four connected layers on top.
Re: NNUE
Herewith an update regarding the NNUE implementation in Damage.
Initially i struggled to get anything meaningful out of the NN.
So i slightly changed the labels of the data-set.
In stead of labeling it with the final outcome of the game (0.0 0.5 and 1.0), i used the Damage evaluation with a sigmoid transformation.
So no real bootstrap learning, or zero approach, but okish for now.....
I started to only use one network consisting of 4 layers.
* Input layer, 191 inputs (45 white man, 50 white king, 45 black man, 50 black king and side to move), and 192 neurons.
* Hidden layer 1, 192 inputs and 32 neurons
* Hidden layer 2, 32 inputs and 32 neurons
* Output layer 32 inputs, 1 neuron.
For the network training I used the (slightly modified) Python script from Jonathan Kreuzer.
In total around 98M positions were used, and training took 3 hours (on the CPU).
I also started with 1 network only (and not 4 as used in GUINN Checkers 2.04).
For performance improvement and also to reduce overhead, and to learn more about the code, I started with the sources from Jonathan, but I changed/modified for my own purpose (but you see still his signature in the code).
To assess the strength I played a DXP match against the latest version of KingsRow.
To compare apples and apples, both use only 1 core, 6p DB, an no book.
Match settings were 65 moves in a 1 minute game (each side).
Result 37 Win Kingsrow and 121 draw, which is (speaking in a positive way) an encouraging start 
 
Which yields an Elo difference of 83.
As the Scan pattern implementation is extremely fast, I see a huge drop in nodes/second.
On 1 core, Damage reaches around 14.8 MN/s, the current NNUE implementation 4.0 MN/sec.
I assume there is still room for improvement, for example I dont apply incremental update for the input layer.
The evaluation code looks quite special, as there are no specific draughts features anymore.
See below.
Will continue during my XMas break, and keep you posted.
Bert
			
			
									
						
										
						Initially i struggled to get anything meaningful out of the NN.
So i slightly changed the labels of the data-set.
In stead of labeling it with the final outcome of the game (0.0 0.5 and 1.0), i used the Damage evaluation with a sigmoid transformation.
So no real bootstrap learning, or zero approach, but okish for now.....
I started to only use one network consisting of 4 layers.
* Input layer, 191 inputs (45 white man, 50 white king, 45 black man, 50 black king and side to move), and 192 neurons.
* Hidden layer 1, 192 inputs and 32 neurons
* Hidden layer 2, 32 inputs and 32 neurons
* Output layer 32 inputs, 1 neuron.
For the network training I used the (slightly modified) Python script from Jonathan Kreuzer.
In total around 98M positions were used, and training took 3 hours (on the CPU).
I also started with 1 network only (and not 4 as used in GUINN Checkers 2.04).
For performance improvement and also to reduce overhead, and to learn more about the code, I started with the sources from Jonathan, but I changed/modified for my own purpose (but you see still his signature in the code).
To assess the strength I played a DXP match against the latest version of KingsRow.
To compare apples and apples, both use only 1 core, 6p DB, an no book.
Match settings were 65 moves in a 1 minute game (each side).
Result 37 Win Kingsrow and 121 draw, which is (speaking in a positive way) an encouraging start
 
 Which yields an Elo difference of 83.
As the Scan pattern implementation is extremely fast, I see a huge drop in nodes/second.
On 1 core, Damage reaches around 14.8 MN/s, the current NNUE implementation 4.0 MN/sec.
I assume there is still room for improvement, for example I dont apply incremental update for the input layer.
The evaluation code looks quite special, as there are no specific draughts features anymore.
See below.
Code: Select all
int eval_position(position_t& position)
{
	alignas(32) int16_t values[1024];
	layer_input_compute(&values[256], position); // layer 0, input layer
	nn_draughts.layers[1].layer_hidden_compute(&values[256], &values[512]); // layer 1, hidden layer
	nn_draughts.layers[2].layer_hidden_compute(&values[512], &values[768]); // layer 2, hidden layer
	int16_t eval = nn_draughts.layers[3].layer_output_compute(&values[768]); // layer 3, output layer
	if (position.bturn() == false) eval = -eval;
	return (int)eval;
}
Bert
Re: NNUE
Sidiki, I used a Python Keras/Tensorflow framework for learning (example taken from Jonathan Kreuzer who also used this in GuiNN Checkers 2.04).
Basically you only need a few lines of code for that.
The Python file (trainnet.py) is a little larger (90 code lines), but the other code lines mainly deal with input and output processing and some initialization.
See below.
Bert
			
			
									
						
										
						Basically you only need a few lines of code for that.
The Python file (trainnet.py) is a little larger (90 code lines), but the other code lines mainly deal with input and output processing and some initialization.
See below.
Code: Select all
 # Create the neural net model
        model = keras.Sequential([	
                keras.layers.Dense(layerSizes[0], activation="relu"),
                keras.layers.Dense(layerSizes[1], activation="relu"),
                keras.layers.Dense(layerSizes[2], activation="relu"),
                keras.layers.Dense(layerSizes[3], activation="sigmoid"), # use sigmoid for our 0-1 training labels
                ])
        lr_schedule = keras.optimizers.schedules.ExponentialDecay(initial_learning_rate=.01,decay_steps=5000,decay_rate=0.96)
        opt = keras.optimizers.Adam( learning_rate = lr_schedule  )
        #not sure what loss function should be or if should use sigmoid activation
        model.compile(optimizer=opt, loss="mean_squared_error")
        model.fit(positionData, positionLabels, batch_size= batchSizeParam, epochs= epochsParam )
Re: NNUE
Sidiki, and to reply on your other question.
I used the previous match file and extracted a file with positions, and a file with result labels (based upon the previous Damage Evaluation, scaled with a Sigmoid function).
You could say that in this way you project the old evaluation function into a neural net architecture.
So as said before, not (yet) a bootstrap or zero approach....
Think the Chess NNUE world also started this way, and then proceed with further learning with autopay, but I'm not 100% sure.
Guess Rein knows the details.
Bert
			
			
									
						
										
						I used the previous match file and extracted a file with positions, and a file with result labels (based upon the previous Damage Evaluation, scaled with a Sigmoid function).
You could say that in this way you project the old evaluation function into a neural net architecture.
So as said before, not (yet) a bootstrap or zero approach....
Think the Chess NNUE world also started this way, and then proceed with further learning with autopay, but I'm not 100% sure.
Guess Rein knows the details.
Bert
