Over the last week or so, I've written a Python script that creates a draughts evaluation function as a Keras/TensorFlow neural network that can be trained with gradient descent. The script can learn a draughts evaluation function using both positional features as well as the Scan-like pattern indices that are currently in use by all the top draughts engines, including Kingsrow.
Acknowledgement
Ed Gilbert has been extremely helpful explaining his Kingsrow evaluation function, thanks Ed! As a disclaimer, here is the information exchanged between us:
- I have only seen Ed's low-level C++ eval function that combines the features + pattern indices and their weights into a score, his raw training data, his trained weights and the corresponding score predictions.
- I have not seen Ed's top-level C++ eval function that translates a position into features and pattern indices, nor his actual gradient descent code, nor his training data generation code.
The Kingsrow eval can be parameterized by the following constants:
Code: Select all
num_features = 5 # balance, tempo, men, multi kings and first king
num_patterns = 4 # partially overlapping areas
num_phases = 2 # opening and endgame
num_views = 2 # normal and mirrored
num_pieces = 3 # empty, black and white men
num_squares = 12 # 4x6 areas on a checkerboard
index_shape = (num_pieces**num_squares, num_patterns, num_phases)
Code: Select all
features = keras.Input(shape=(num_features,), name='features')
patterns = keras.Input(shape=(num_patterns, num_views), name='patterns', dtype='int32')
phases = keras.Input(shape=(num_phases,), name='phases')
feature_scores = keras.layers.Dense(units=num_phases, use_bias=False, name='feature-scores')(features)
pattern_scores = SparseIndexLookup(units=index_shape, name='pattern-scores')(patterns)
scores = keras.layers.Lambda(score_reduce, name='scores')([feature_scores, pattern_scores])
phase_weighted_score = keras.layers.Lambda(phase_reduce, name='phase-weighted-score')([scores, phases])
value_head = keras.layers.Activation(activation='sigmoid', name='value-head')(phase_weighted_score)
model = keras.Model(inputs=[features, patterns, phases], outputs=value_head)
Note that this is not a "deep learning" model such as used in AlphaZero, but rather a very shallow neural network. Furthermore it consumes very sparse inputs: namely for each of the possible 3**12 * 4 * 2 indices, only 4 * 2 are valid for each side, so only 16 pattern weights (and 10 feature weights) contribute to the eval for each position.
Manually initializing the Keras network with the Kingsrow weights reproduces the exact same predicted scores on a training set of 230 million positions. This is a good test that the Keras model in Python is a faithful reimplementation of the Kingsrow eval as Ed had written it in C++.
Model Training
Using Keras to train the weights from scratch on these 230 million positions is another story. I used an out-of-the-box gradient descent optimizer ("Adam") without any further tricks using a mini-batch size of 65K positions. Letting Keras train for 30 epochs (=30 full passes over the data) took about 30 minutes on my machine that has a $140 GPU with 4Gb of RAM (GTX 1050 Ti).
Code: Select all
model.compile(optimizer='adam', loss='mse')
model.fit(
X_train, y_train,
batch_size=2**16,
epochs=30,
validation_data=(X_val, y_val)
)
There is still plenty of things left to explore, such as weight regularization, learning rate schedules, different loss targets and activation functions, batch normalizations, etc., etc. Also, this script takes the training data as given, but you could easily use it inside a Reinforcement Learning training loop that alternates between data generation (so-called Policy Evaluation) and re-training the weights (so-called Policy Improvement). This is only the end of the beginning
Conclusion
In the coming days, I'm cleaning up the support code (there's also code to read and write weights from disk, and to read in training data, not shown here), and release everything on GitHub. Wrapping the model in a proper keras.Model class is also on the to-do list (this interferes with the summary() and plot_model() functions, so I need to figure that out).
Regardless of the exact match outcome, I think it's fair to say that it's now possible to get out-of-the-box world class performance using Keras/TensorFlow, given of course that you have a good training game generation pipeline set up already. This should make good on my claim from last May that it is in principle possible to use a professional optimization library for learning draughts evaluation functions.