Breakthrough Draughts

Rein Halbersma · Post by **Rein Halbersma** » Mon Oct 09, 2017 14:07

TAILLE wrote:
Rein Halbersma wrote:
Hi Gerard,

Is it possible for you to modify your routine so as to print the capture positions after the last generation pass? And then drop them for the final compression stage? Or would this be a too invasive change of your generator?

Rein
I don't quite understand your question.

Basically my routine looks like:

for (db = first db; db < last db; db = following db)
{
for (p = first position of the db; p < last position; p = following position)
{
if (p == capture position) continue;
else {generate successors; evaluate p; save result in memory db}
}
compress memory db and store on disk;
}

Of course I added some complexity in order to have a multithread routine

Ah OK, thanks I understand. From what I learnt from Ed's description, he generates the full results for all positions, including captures, and then after generation he sets the capture positions to the most common value (e.g. a draw) and then he compresses the databases. With that method, you can still generate capture statistics just before compression.

MichelG · Post by **MichelG** » Mon Oct 09, 2017 21:34

TAILLE wrote:Hi Michel,
Good news to see your interesting by this beautitul draughts variant.
As with Bert I have the same difficulty with you : I did not generate the capture positions. Bert managed to add statitistics on non capture position in order to compare our figures but I do not know if it is easy for your implementation to do the same.
FYI my figures are
1x1 NCW1= 1 028 NCL1= 837 NCW2= - NCL2= -
2x1 NCW1= 30 993 NCL1= 6 865 NCW2= 11 268 NCL2= 26 595
2x2 NCW1= 424 883 NCL1= 285 789 NCW2= - NCL2= -
3x1 NCW1= 446 168 NCL1= 55 534 NCW2= 113 391 NCL2= 388 438
3x2 NCW1= 6 778 061 NCL1= 1 948 633 NCW2= 3 775 435 NCL2= 4 952 702
3x3 NCW1= 63 546 643 NCL1= 35 985 469 NCW2= - NCL2= -
4x1 NCW1= 4 518 137 NCL1= 363 194 NCW2= 927 678 NCL2= 3 954 768
4x2 NCW1= 67 423 737 NCL1= 11 426 249 NCW2= 28 548 951 NCL2= 50 312 655
4x3 NCW1= 652 076 102 NCL1= 184 976 683 NCW2= 438 218 089 NCL2= 398 830 286
4x4 NCW1= 4 486 519 068 NCL1= 2 080 557 566 NCW2= - NCL2= -

I did a non-capture count of my databases, and the numbers match

How big a database did you build?

Michel

TAILLE · Post by **TAILLE** » Mon Oct 09, 2017 22:25

MichelG wrote:
TAILLE wrote:Hi Michel,
Good news to see your interesting by this beautitul draughts variant.
As with Bert I have the same difficulty with you : I did not generate the capture positions. Bert managed to add statitistics on non capture position in order to compare our figures but I do not know if it is easy for your implementation to do the same.
FYI my figures are
1x1 NCW1= 1 028 NCL1= 837 NCW2= - NCL2= -
2x1 NCW1= 30 993 NCL1= 6 865 NCW2= 11 268 NCL2= 26 595
2x2 NCW1= 424 883 NCL1= 285 789 NCW2= - NCL2= -
3x1 NCW1= 446 168 NCL1= 55 534 NCW2= 113 391 NCL2= 388 438
3x2 NCW1= 6 778 061 NCL1= 1 948 633 NCW2= 3 775 435 NCL2= 4 952 702
3x3 NCW1= 63 546 643 NCL1= 35 985 469 NCW2= - NCL2= -
4x1 NCW1= 4 518 137 NCL1= 363 194 NCW2= 927 678 NCL2= 3 954 768
4x2 NCW1= 67 423 737 NCL1= 11 426 249 NCW2= 28 548 951 NCL2= 50 312 655
4x3 NCW1= 652 076 102 NCL1= 184 976 683 NCW2= 438 218 089 NCL2= 398 830 286
4x4 NCW1= 4 486 519 068 NCL1= 2 080 557 566 NCW2= - NCL2= -
I did a non-capture count of my databases, and the numbers match

How big a database did you build?

Michel

Hi Michel,

I generated also 5x1, 5x2, 5x3, 5x4, 5x5, 6x1, 6x2, 6x3 and 6x4.
My goal was to reach 6x6 but during the 6x5 generation I noted my db reached 300Gb and the 6x5 db was just half generated. I stopped it because I conclude I will not be able to generate the 6x6 db due to the space needed on the disk. In addition the time needed becomes also quite large.

Now I make a pause on the generation process and I work on the eval function.

Later I expect to find a new compression mechanism in order to solve disk space issue.

MichelG · Post by **MichelG** » Fri Dec 15, 2017 08:47

A small update from my part.

I have added the breakthrough rules to dragon.

First i played with the original evaluation function (from international draughts) against scan 3.0. This scores about 10%.

Currently i am learning a new set of evaluation patterns for breakthrough. The initial results are encouraging; after about 80 hours of learning, dragon wins about 33%. (using 15-field patterns)

Before increasing the learning time, there is some more research to do. For example selecting the right positions to feed the learner, and finding the best hyperparameters.

Dragon now uses 7 piece endgame databases, but it seems that these increase strength only by a relatively small amount.

Michel

Krzysztof Grzelak · Post by **Krzysztof Grzelak** » Fri Dec 15, 2017 09:50

I'm sorry to ask you Michel, I understand that you have added updates on your homepage program.

TAILLE · Post by **TAILLE** » Fri Dec 15, 2017 19:32

MichelG wrote:A small update from my part.

I have added the breakthrough rules to dragon.

First i played with the original evaluation function (from international draughts) against scan 3.0. This scores about 10%.

Currently i am learning a new set of evaluation patterns for breakthrough. The initial results are encouraging; after about 80 hours of learning, dragon wins about 33%. (using 15-field patterns)

Before increasing the learning time, there is some more research to do. For example selecting the right positions to feed the learner, and finding the best hyperparameters.

Dragon now uses 7 piece endgame databases, but it seems that these increase strength only by a relatively small amount.

Michel

Hi Michel,

I continue to work on breakthrough draugths. I am trying to understand reinforcement learning, I confess it is quite hard for me but I am really motivated though my progress are rather slow!

Could I ask you one or two questions Michel? I tried my first reinforcement learning on a very simple base : what is the value of a material advantage ? In my classical evaluation I decided that the value of one man was +100 and the value of a win was equal to the value of 20 men i.e. +2000. With my first experiments with reinforcement learning and keeping the value +2000 for a win the programm told me after a reinforcement learning that the value of an advantage of one man is a value near from +1000, the value of an advantage of 2 men is a value near form 1600 etc. and the value of en advantage of 10 men is very near from +2000.
Clearly the values I estimated represent the probability to win and obvioulsly it could not be a linear function. Do you also have a estimation function representing the probablity of winning (this sounds a great simplication due to the fact that the draw does not exist) and do you have also such non-linear results?

Gérard

MichelG · Post by **MichelG** » Mon Dec 18, 2017 21:30

Hi Gérard,

The learning that i implemented, uses the logistic function to map the sum of features to a winning rate:

winrate = 1/(1+exp(-f))

The winrate is the number of points you can expect to gain from the position (loss=0, draw=0.5, win=1.0), f is the sum of the feature weights. The learner finds the weight_of_feature values that minimises sumof(expected_winrate - actual_score) for all positions.

If you apply this to the learning mechanism, you end up with a sum of features (f) for each position, and from that you can calculate the win rate.

For example, suppose you have 2 features:

feature1 = number of my man - number of opponent man:
feature2 = my_mobility - oppponent_mobility

It might figure out the following:
weigth_of_feature1= 0.9
weigth_of_feature2= 0.1
and f=1 * 0.9 + 2 * 0.1 = one point one

In a position where you are 1 man ahead and have 2 mobility difference, the winrate becomes:
winrate = 1/(1+exp(-(1 * 0.9 + 2 * 0.1) )) = 0.75

In this hypothetical evaluation function, the winrate gets closer to 1 as the advantage gets bigger.

Note that the feature weights are close to zero. In dragon, i make these integers for faster processing by multiplying them by 1000 or so. So weigth_of_feature1= 900, weigth_of_feature2= 100 and winrate = 1/(1+(exp(-0.001*f)))

Also, you don't need to actually compute the winrate. Since the logistic function is monotone, you can use use the sum of features as score function.

If you want to learn more, i think it is best to read some articles on logistic regression techniques.

One fun thing i noticed when experimenting, is that my evaluation function can work without counting the man-difference, and be based purely on patterns alone. It's only a minor disadvantage.

Michel

TAILLE · Post by **TAILLE** » Tue Dec 19, 2017 12:20

MichelG wrote:Hi Gérard,

The learning that i implemented, uses the logistic function to map the sum of features to a winning rate:

winrate = 1/(1+exp(-f))

The winrate is the number of points you can expect to gain from the position (loss=0, draw=0.5, win=1.0), f is the sum of the feature weights. The learner finds the weight_of_feature values that minimises sumof(expected_winrate - actual_score) for all positions.

If you apply this to the learning mechanism, you end up with a sum of features (f) for each position, and from that you can calculate the win rate.

For example, suppose you have 2 features:

feature1 = number of my man - number of opponent man:
feature2 = my_mobility - oppponent_mobility

It might figure out the following:
weigth_of_feature1= 0.9
weigth_of_feature2= 0.1
and f=1 * 0.9 + 2 * 0.1 = one point one

In a position where you are 1 man ahead and have 2 mobility difference, the winrate becomes:
winrate = 1/(1+exp(-(1 * 0.9 + 2 * 0.1) )) = 0.75

In this hypothetical evaluation function, the winrate gets closer to 1 as the advantage gets bigger.

Note that the feature weights are close to zero. In dragon, i make these integers for faster processing by multiplying them by 1000 or so. So weigth_of_feature1= 900, weigth_of_feature2= 100 and winrate = 1/(1+(exp(-0.001*f)))

Also, you don't need to actually compute the winrate. Since the logistic function is monotone, you can use use the sum of features as score function.

If you want to learn more, i think it is best to read some articles on logistic regression techniques.

One fun thing i noticed when experimenting, is that my evaluation function can work without counting the man-difference, and be based purely on patterns alone. It's only a minor disadvantage.

Michel

Hi Michel,

First of all thank for your answer.
I fully understand your "winrate = 1/(1+exp(-f))" function. Because this function is monotone it is clear you do not need to calculate it explicitly; you can use only the f fonction as an eval function, without calculating the winrate. I am only surprise by seeing your function is non linear.
Let's take the breakthrough draughts. We all know that the initial position is either a winning one or a losing one. Suppose now that after 1000 games you observed 600 wins and 400 loss. That means that the winrates is 60%. In my implementation where a win is +2000 and a loss is -2000 the value of the intitial position will be about +400 which is the linear value of 60% over the range [-2000, +2000].
Because you function f is not a linear function of the winrate I have some difficulty to see its real meaning though I see clearly that you can easily derive the winrate from this f function.

MichelG · Post by **MichelG** » Tue Dec 19, 2017 18:39

I don't think there is any 'human' meaning to the function. It just maps -infinite to +infinite to the [0,1] range, and does so with some particular nice properties.

One thing to consider is that when for example you are 5 pieces ahead and say f=2.5, winning another piece doesn't increase the winrate by much because the derivative of the logistic function is small for high f. This correspondence allows simple relations between for example the piece-difference and score.

In your example, a 60% win rate would give f=0.405. Mathematically, you can't map this a another range, like [-2000, 2000], because f could in theory be infinitely large.

In practice, this is not the case though; f is usually in the range [-5, +5] because otherwise the game is over within a few moves.

In dragon, i multiply f by 500, and limit it to the range [-29000, +29000]. With f=58, the eval says it has a 0.9999.. winrate anyway.

Michel

TAILLE · Post by **TAILLE** » Tue Dec 19, 2017 19:28

MichelG wrote:I don't think there is any 'human' meaning to the function. It just maps -infinite to +infinite to the [0,1] range, and does so with some particular nice properties.

One thing to consider is that when for example you are 5 pieces ahead and say f=2.5, winning another piece doesn't increase the winrate by much because the derivative of the logistic function is small for high f. This correspondence allows simple relations between for example the piece-difference and score.

In your example, a 60% win rate would give f=0.405. Mathematically, you can't map this a another range, like [-2000, 2000], because f could in theory be infinitely large.

In practice, this is not the case though; f is usually in the range [-5, +5] because otherwise the game is over within a few moves.

In dragon, i multiply f by 500, and limit it to the range [-29000, +29000]. With f=58, the eval says it has a 0.9999.. winrate anyway.

Michel

Hi Michel,

Maybe I am wrong but it appears to me that by calculating your f function you consider that the piece-difference is associated only one feature. Is it true?
FYI in my implementation each piece-difference value (from 1 to 20) is associated to one feature among 20. That way the learning process can evaluate precisely a piece-difference.
The advantage is the following: suppose you have the choice between a move winning a man (but with a poor resulting positionnal position) and a move allowing to reach a good positionnal position. The choice will really depend of the current piece-difference: with a very small piece-difference Damy will certainly choose to win a man but with a bigger piece-differnce Damy will choose the positional move which sounds clever for a human point of view doesn't it?

MichelG · Post by **MichelG** » Tue Dec 19, 2017 20:15

TAILLE wrote:
Hi Michel,

Maybe I am wrong but it appears to me that by calculating your f function you consider that the piece-difference is associated only one feature. Is it true?
FYI in my implementation each piece-difference value (from 1 to 20) is associated to one feature among 20. That way the learning process can evaluate precisely a piece-difference.
The advantage is the following: suppose you have the choice between a move winning a man (but with a poor resulting positionnal position) and a move allowing to reach a good positionnal position. The choice will really depend of the current piece-difference: with a very small piece-difference Damy will certainly choose to win a man but with a bigger piece-differnce Damy will choose the positional move which sounds clever for a human point of view doesn't it?

Ah, i understand how you implemented the piece difference feature. I can see the advantages of doing it that way.

In dragon, there is only 1 feature for piece-difference and it is indeed linear with the amount of difference.

The problem i see with having 20 features, is that you might have very little example positions for certain differences. For instance if you value +1 piece at 400, +2 pieces at 600, etc, +14 pieces might still be 0, because +14 does not happen in your learning examples.

Then, when the engine does happen to stumble upon +14, it will value it wrong because it has never seen a +14 before.

There is a compromise you can make: +1 is a feature, +2 is a feature, and then there is a third feature which is linear to the amount of difference above 2.

Currently dragon only has 3 fixed features plus the patterns. I haven't really looked into designing the fixed features yet; they appear to be not very important. I plan to do some more experiments with it later though.

Michel

TAILLE · Post by **TAILLE** » Tue Dec 19, 2017 23:18

MichelG wrote:
TAILLE wrote:
Hi Michel,

Maybe I am wrong but it appears to me that by calculating your f function you consider that the piece-difference is associated only one feature. Is it true?
FYI in my implementation each piece-difference value (from 1 to 20) is associated to one feature among 20. That way the learning process can evaluate precisely a piece-difference.
The advantage is the following: suppose you have the choice between a move winning a man (but with a poor resulting positionnal position) and a move allowing to reach a good positionnal position. The choice will really depend of the current piece-difference: with a very small piece-difference Damy will certainly choose to win a man but with a bigger piece-differnce Damy will choose the positional move which sounds clever for a human point of view doesn't it?
Ah, i understand how you implemented the piece difference feature. I can see the advantages of doing it that way.

In dragon, there is only 1 feature for piece-difference and it is indeed linear with the amount of difference.

The problem i see with having 20 features, is that you might have very little example positions for certain differences. For instance if you value +1 piece at 400, +2 pieces at 600, etc, +14 pieces might still be 0, because +14 does not happen in your learning examples.

Then, when the engine does happen to stumble upon +14, it will value it wrong because it has never seen a +14 before.

There is a compromise you can make: +1 is a feature, +2 is a feature, and then there is a third feature which is linear to the amount of difference above 2.

Currently dragon only has 3 fixed features plus the patterns. I haven't really looked into designing the fixed features yet; they appear to be not very important. I plan to do some more experiments with it later though.

Michel

In order to solve the problem you mentionned I implemented two points:
1) I initialised the value of my 20 features to a value near from +2000
2) As soon as I want to modify a value I take into account a handwritten constraint saying that these values have to be ordered.

Concerning the fixed features my feeling is that these features, like piece-difference, have not a linear behavior. As a consequence I will try to have same approach by dividing the feature in 20 or more features.
For the moment I have identified also 3 fixed features: piece-difference, tempo-difference and right-left balance.
I am also analysing what could be the relevant patterns to consider (I do not like very much the patterns used by Fabien but we can never be sure can we?)

MichelG · Post by **MichelG** » Wed Dec 20, 2017 08:54

TAILLE wrote:
In order to solve the problem you mentionned I implemented two points:
1) I initialised the value of my 20 features to a value near from +2000
2) As soon as I want to modify a value I take into account a handwritten constraint saying that these values have to be ordered.

Concerning the fixed features my feeling is that these features, like piece-difference, have not a linear behavior. As a consequence I will try to have same approach by dividing the feature in 20 or more features.
For the moment I have identified also 3 fixed features: piece-difference, tempo-difference and right-left balance.
I am also analysing what could be the relevant patterns to consider (I do not like very much the patterns used by Fabien but we can never be sure can we?)

5 years ago, I wrote an optimizer for finding the best patterns. However, it takes a long time to do so since you have to go through the learning procedure hundreds of times.

So i only did this based on a small amount of examples, on 10-field patterns.

When compared to the original (square-like) patterns, there definitely is an improvement. In one test, the winrate went from 60% to 62%.

I don't yet have a mechanism to optimize the bigger (15 fields and up) patterns. The problem is that it takes so much CPU time, i feel that it is better to spent this CPU time on finding more training examples. I will try and optimize 12-field patterns soon.

Below is one of the patterns that came out of the optimizer (out of 24)

Code: Select all

     0   0   0   0   0
   0   0   0   0   0
     0   0   0   0   0
   0   1   1   0   0
     0   1   1   0   0
   0   1   1   1   0
     0   0   1   0   0
   0   0   1   0   0
     0   1   0   0   0
   0   0   0   0   0

Michel

Ed Gilbert · Post by **Ed Gilbert** » Wed Dec 27, 2017 15:38

Hi Michel,

MichelG wrote: One fun thing i noticed when experimenting, is that my evaluation function can work without counting the man-difference, and be based purely on patterns alone. It's only a minor disadvantage.
Michel

I never tried that experiment, but it is not intuitive to me that it would only be a minor disadvantage. Without an eval term for man difference, I would expect the engine might toss a piece just to gain some minor positional advantage. Do you understand why this doesn't seem to be a problem?

-- Ed

MichelG · Post by **MichelG** » Wed Dec 27, 2017 20:15

Ed Gilbert wrote:Hi Michel,

I never tried that experiment, but it is not intuitive to me that it would only be a minor disadvantage. Without an eval term for man difference, I would expect the engine might toss a piece just to gain some minor positional advantage. Do you understand why this doesn't seem to be a problem?

-- Ed

My guess is that the patterns themselves can do the count. Maybe patterns with more man get higher values and it is regulated that way, albeit with some loss of accuracy.

In one experiment, i trained my eval on 16m positions and got a score of 0.7 (out of 1.0) against some reference eval function. Removing the piececount lowered it to 0.67.

I also tried eval features for -1 piece, +1 piece, -2 pieces, +2 pieces, somewhat similar to Gérard plans on this. That added about 4-5 elo points (e.g. barely measureable after 50k games), but it adds memory and time requirements for the optimizer.

So as a very rough gideline (in breakthrough):
- each doubling of the amount of training positions adds about 40-60 points of elo.
- Fixed feature: piececount : about 30-50 elo points
- Fixed feature: specific piececount: about 4-5 elo points (compared to linear piececount)
- Fixed feature: tempo difference: so small it is not measureable after 50k games.
- Using an optimised set of patterns instead of squares: 20-40 elo.

For international draughts, just divide elo points by 10.

Michel

World Draughts Forum

Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts