Breakthrough Draughts

Ed Gilbert · Post by **Ed Gilbert** » Fri Dec 29, 2017 12:29

My guess is that the patterns themselves can do the count. Maybe patterns with more man get higher values and it is regulated that way, albeit with some loss of accuracy.

Thanks, that sounds like a reasonable explanation. I had added a tempo eval and seen only a tiny improvement. Now I understand why.

-- Ed

TAILLE · Post by **TAILLE** » Sat Jan 06, 2018 22:32

MichelG wrote:
Ed Gilbert wrote:Hi Michel,

I never tried that experiment, but it is not intuitive to me that it would only be a minor disadvantage. Without an eval term for man difference, I would expect the engine might toss a piece just to gain some minor positional advantage. Do you understand why this doesn't seem to be a problem?

-- Ed
My guess is that the patterns themselves can do the count. Maybe patterns with more man get higher values and it is regulated that way, albeit with some loss of accuracy.

In one experiment, i trained my eval on 16m positions and got a score of 0.7 (out of 1.0) against some reference eval function. Removing the piececount lowered it to 0.67.

I also tried eval features for -1 piece, +1 piece, -2 pieces, +2 pieces, somewhat similar to Gérard plans on this. That added about 4-5 elo points (e.g. barely measureable after 50k games), but it adds memory and time requirements for the optimizer.

So as a very rough gideline (in breakthrough):
- each doubling of the amount of training positions adds about 40-60 points of elo.
- Fixed feature: piececount : about 30-50 elo points
- Fixed feature: specific piececount: about 4-5 elo points (compared to linear piececount)
- Fixed feature: tempo difference: so small it is not measureable after 50k games.
- Using an optimised set of patterns instead of squares: 20-40 elo.

For international draughts, just divide elo points by 10.

Michel

Hi Michel,

Concerning tempo difference do you also consider the patterns themselves can do the count or do you think that this feature is irrelevant?

MichelG · Post by **MichelG** » Sun Jan 07, 2018 14:59

TAILLE wrote:
Hi Michel,

Concerning tempo difference do you also consider the patterns themselves can do the count or do you think that this feature is irrelevant?

I think the question is whether you can estimate the tempo difference by using patterns alone. The answer is a definite yes here. In fact, you can create the weights by hand if you would want to.

I believe, for the fixed features to be of much use, it should not be possible to have a pattern predict their value.

There is 1 exception: if it the fixed feature is very important, like piece-difference, adding it makes it easier for the weights to converge on the right values, especially if you are using L2 regularization; all energy goes into 1 weight instead of thousands.

Here are some examples of features that are not predicable by patterns:
piece_difference * tempo_difference
tempo_difference > constant
tempo_difference cubed
distance_to_promotionLine_difference
etc

I tried some of these, but i have yet not found a particular useful fixed feature.

Michel

TAILLE · Post by **TAILLE** » Sun Jan 07, 2018 19:46

MichelG wrote:
TAILLE wrote:
Hi Michel,

Concerning tempo difference do you also consider the patterns themselves can do the count or do you think that this feature is irrelevant?
I think the question is whether you can estimate the tempo difference by using patterns alone. The answer is a definite yes here. In fact, you can create the weights by hand if you would want to.

I believe, for the fixed features to be of much use, it should not be possible to have a pattern predict their value.

There is 1 exception: if it the fixed feature is very important, like piece-difference, adding it makes it easier for the weights to converge on the right values, especially if you are using L2 regularization; all energy goes into 1 weight instead of thousands.

Here are some examples of features that are not predicable by patterns:
piece_difference * tempo_difference
tempo_difference > constant
tempo_difference cubed
distance_to_promotionLine_difference
etc

I tried some of these, but i have yet not found a particular useful fixed feature.

Michel

Hi Michel,

It seems I have a different view (but I have not your experience in reinforcement learning !)

I agree with you that the tempo_difference can be estimated by the patterns alone but does that mean that we should avoid to use such fixed feature?
My view is the following : because the various patterns could be highly correlated I surely need some regulation to help convergence. Now the point is the following : tempo_difference is a global feature with no major correlation with other features. I think in this case that by handling the tempo-difference with a separate feature I can handled this separate feature without regulation hoping to reach a better convergence with patterns handled with regulation.

Of course I have to run a lot of tests to confirm this approach!

MichelG · Post by **MichelG** » Fri Jan 12, 2018 20:32

TAILLE wrote:
Hi Michel,

It seems I have a different view (but I have not your experience in reinforcement learning !)

I agree with you that the tempo_difference can be estimated by the patterns alone but does that mean that we should avoid to use such fixed feature?
My view is the following : because the various patterns could be highly correlated I surely need some regulation to help convergence. Now the point is the following : tempo_difference is a global feature with no major correlation with other features. I think in this case that by handling the tempo-difference with a separate feature I can handled this separate feature without regulation hoping to reach a better convergence with patterns handled with regulation.

Of course I have to run a lot of tests to confirm this approach!

I agree it can potentially help with regularization.

But suppose you can express the tempo-difference of a position in patterns. (which can be done in an exact manner depending on how your patterns are organized)

Then you can add the value of tempo-difference-patterns to the rest of your patterns and discard your original tempo-weight estimate:

oldEval=tempoWeight*tempo + sum(1,N)[patternWeightEval(x)]
and
tempoWeight*tempo = sum(1,N)[patternWeightTempo(x)]
this means
oldEval=sum(1,N)[(patternWeightEval(x)+patternWeightTempo(x)]

Which saves you time in the eval function, since you don't need to compute the tempo anymore, and you can add these weights offline.

Michel

TAILLE · Post by **TAILLE** » Fri Jan 12, 2018 23:29

MichelG wrote:
TAILLE wrote:
Hi Michel,

It seems I have a different view (but I have not your experience in reinforcement learning !)

I agree with you that the tempo_difference can be estimated by the patterns alone but does that mean that we should avoid to use such fixed feature?
My view is the following : because the various patterns could be highly correlated I surely need some regulation to help convergence. Now the point is the following : tempo_difference is a global feature with no major correlation with other features. I think in this case that by handling the tempo-difference with a separate feature I can handled this separate feature without regulation hoping to reach a better convergence with patterns handled with regulation.

Of course I have to run a lot of tests to confirm this approach!
I agree it can potentially help with regularization.

But suppose you can express the tempo-difference of a position in patterns. (which can be done in an exact manner depending on how your patterns are organized)

Then you can add the value of tempo-difference-patterns to the rest of your patterns and discard your original tempo-weight estimate:

oldEval=tempoWeight*tempo + sum(1,N)[patternWeightEval(x)]
and
tempoWeight*tempo = sum(1,N)[patternWeightTempo(x)]
this means
oldEval=sum(1,N)[(patternWeightEval(x)+patternWeightTempo(x)]

Which saves you time in the eval function, since you don't need to compute the tempo anymore, and you can add these weights offline.

Michel

Hi Michel,

I understand but I do not see clearly how the tempo_difference can be estimated by the patterns.
Let's take a simple example. Suppose you are dealing only with classical game. With a separate feature and by dividing the tempo_difference in for example 20 possible values from -10 to +9 you might find that:
it is disadvantage to have a tempo_difference between -10 and -6
it is an advantage to have a tempo_difference between -5 and -1
it is disadvantage to have a tempo_difference between 0 and 4
it is an advantage to have a tempo_difference between 5 and 9
How could you code this in the pattern's values?

MichelG · Post by **MichelG** » Sat Jan 13, 2018 17:52

TAILLE wrote:
Hi Michel,

I understand but I do not see clearly how the tempo_difference can be estimated by the patterns.
Let's take a simple example. Suppose you are dealing only with classical game. With a separate feature and by dividing the tempo_difference in for example 20 possible values from -10 to +9 you might find that:
it is disadvantage to have a tempo_difference between -10 and -6
it is an advantage to have a tempo_difference between -5 and -1
it is disadvantage to have a tempo_difference between 0 and 4
it is an advantage to have a tempo_difference between 5 and 9
How could you code this in the pattern's values?

You can't...

Using patterns, you could only encode a linear function of the tempo, e.g. value=W*tempo

That is the both the strength and limitation of the entire approach; it only encodes linear functions. If you want non-linearity, you need to add features or structure manually. And that can get very complex fast. Maybe it is an advantage to have between +4 and +8 tempo when you are 1 man ahead, and have 2 pieces on row 6.

There is one way around it; use deep neural nets. Those can learn the non-linear relations as well. However the neural nets come with a big disadvantage; it takes a huge amount of computing power to evaluate them.

Google seems to be getting away with that in go and chess, but that might be more difficult for us simple programmers.

Michel

TAILLE · Post by **TAILLE** » Sat Jan 13, 2018 19:29

MichelG wrote:
TAILLE wrote:
Hi Michel,

I understand but I do not see clearly how the tempo_difference can be estimated by the patterns.
Let's take a simple example. Suppose you are dealing only with classical game. With a separate feature and by dividing the tempo_difference in for example 20 possible values from -10 to +9 you might find that:
it is disadvantage to have a tempo_difference between -10 and -6
it is an advantage to have a tempo_difference between -5 and -1
it is disadvantage to have a tempo_difference between 0 and 4
it is an advantage to have a tempo_difference between 5 and 9
How could you code this in the pattern's values?
You can't...

Using patterns, you could only encode a linear function of the tempo, e.g. value=W*tempo

That is the both the strength and limitation of the entire approach; it only encodes linear functions. If you want non-linearity, you need to add features or structure manually. And that can get very complex fast. Maybe it is an advantage to have between +4 and +8 tempo when you are 1 man ahead, and have 2 pieces on row 6.

There is one way around it; use deep neural nets. Those can learn the non-linear relations as well. However the neural nets come with a big disadvantage; it takes a huge amount of computing power to evaluate them.

Google seems to be getting away with that in go and chess, but that might be more difficult for us simple programmers.

Michel

Hi Michel,

I agree with all what you said Michel. In particular I think also that patterns can only encode linear functions.
In addition I do not want to handle explicitly non-linear function. For that reason I will continue to explore having several features for dealing with tempo_difference depending of the tempo_difference itself and depending of the type of game (typically classical or non-classical game).

TAILLE · Post by **TAILLE** » Sat Mar 03, 2018 22:40

Hi,

Progress are rather small but I think real. The problem with breakthrough draughts is the lack of references.

As an example how your programm analyses the following position:

White to play

After 41 minutes Damy finds 2 winning moves and 7 seven losing moves. Anyway it is for me a start for future improvements ...
I will let you know.

I you have yourself some basic positions on which you work I will be interesting to exchange with you.

World Draughts Forum

Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts

Re: Breakthrough Draughts