EndGame Databases

BertTuyt · Post by **BertTuyt** » Sat Jun 23, 2007 13:05

Rein, I will share all ideas, even weird ones. During my summer vacation I have more time to do programming and testing. If something interesting pops up I will communicate it via this forum.

Bert

Rein Halbersma · Post by **Rein Halbersma** » Sat Jun 23, 2007 20:27

Ed Gilbert wrote:Rein, I used the symmetry about the double corner diagonal when I wrote the indexing functions for 5k vs 5k in English checkers. If there are more black kings in the triangle {5,29,32} than in {1,4,28} then I flip the board bitmaps around the diagonal. You are right that this complicates the design of the indexing functions. This particular slice took 2 days to compute using the symmetry. It would have taken a lot longer without it because my computer only had 2gb ram and I needed nearly all of it to buffer the slice in memory using the symmetry. If I was doing it today I would not bother using the symmetry because it takes longer to write and debug the indexing code than the time saved building the all-kings slices.

-- Ed

If I count correctly, when you use one symmetry, you have 8,128,542,240 positions per side to move, right?

by using all symmetries, I count only 4,064,429,460 positions per side to move (which is slightly more than half your number).

Ed Gilbert · Post by **Ed Gilbert** » Sat Jun 23, 2007 20:58

Rein Halbersma wrote:If I count correctly, when you use one symmetry, you have 8,128,542,240 positions per side to move, right?

Correct. And because this slice is perfectly symmetrical, you do not need to compute separate data for both sides to move. In this slice I only have data for black. For white to move, reverse the board and lookup in the black to move database.

-- Ed

Rein Halbersma · Post by **Rein Halbersma** » Sat Jun 23, 2007 21:11

For those of you interested in having a computer do all the counting of the number of (possibly symmetry reduced) kings only endgames, first install the open source symbolic algebra program Maxima from http://maxima.sourceforge.net/ and then copy paste the following code to its command line (change d: 10 into d: 8 if you are interested in 8x8 checkers).

Code: Select all

d    : 10;
DA   : expand((e+b+w)^(d^2/2));
DS   : expand((e+b+w)^d * (e^2+b^2+w^2)^((d-2)*d/4));
DD   : expand((e^2+b^2+w^2)^(d^2/4));
DR   : expand((e^2+b^2+w^2)^(d^2/4));
DT   : expand(1/4 * (DA+DS+DD+DR));
DC   : expand((e^2+b^2+w^2)^(d/2) * (e^4+b^4+w^4)^((d-2)*d/8));
DSred: expand((DS-DC)/2);
DDred: expand((DD-DC)/2);
DRred: expand((DR-DC)/2);
DAred: expand(1/4 * (DA - 2*(DSred+DDred+DRred) - DC));
DTred: expand(DAred + DSred + DDred + DRred + DC);

The above code has several polynomials with as arguments empty squares (e), white kings (w) and black kings (b). (For the mathematically interested, what is being written here is the Cycle Index Polynomial of the symmetry group of the 10x10 checkers board. I learned this trick from the PhD thesis of Ralph Gasser, who solved the game Nine Men's Morris, see http://citeseer.ist.psu.edu/cache/paper ... essing.pdf).

Expanding the polynomial DT (or DTred which is a rewriting of DT) by just typing "DT;" gives a lot of terms, among which 635 * e^48 * b * w, which means that there are 635 inequivalent endgames on a 10x10 board with one king for each side. Simarly for all the other terms.

Typing "DA;" gives the total number of endgames without taking care of symmetries. It has e.g. the term 2450 * e^48 * b * w, which means that there are 2450 one king vs one king endgames on a 10x10 board.

Typing "DS;", "DD;" or "DR;" gives again a whole bunch of terms, each of which gives the number of endgames that are symmetric under flips along the single corner diagonal (DS, which has e.g. coefficient 90 for e^48*b*w which means there are 90 ways of placing a black and a white king on the long diagonal), the double corner diagonal (DD) or under 180 degree rotations (DR). Typing "DC;" gives the number of endgames which are symmetric under all three of the above symmetries simultaneously.

The polynomials DSred, DDred, DRred give the number of inequivalent ("canonical") positions that are symmetric under one of the three symmetries. DAred gives the number of endgames which are not symmetric under any of the three symmetries (590 for 1 king vs 1 king).

TAILLE · Post by **TAILLE** » Sun Jun 24, 2007 10:24

Rein Halbersma wrote:For those of you interested in having a computer do all the counting of the number of (possibly symmetry reduced) kings only endgames, first install the open source symbolic algebra program Maxima from http://maxima.sourceforge.net/ and then copy paste the following code to its command line (change d: 10 into d: 8 if you are interested in 8x8 checkers).
Code: Select all
d    : 10;
DA   : expand((e+b+w)^(d^2/2));
DS   : expand((e+b+w)^d * (e^2+b^2+w^2)^((d-2)*d/4));
DD   : expand((e^2+b^2+w^2)^(d^2/4));
DR   : expand((e^2+b^2+w^2)^(d^2/4));
DT   : expand(1/4 * (DA+DS+DD+DR));
DC   : expand((e^2+b^2+w^2)^(d/2) * (e^4+b^4+w^4)^((d-2)*d/8));
DSred: expand((DS-DC)/2);
DDred: expand((DD-DC)/2);
DRred: expand((DR-DC)/2);
DAred: expand(1/4 * (DA - 2*(DSred+DDred+DRred) - DC));
DTred: expand(DAred + DSred + DDred + DRred + DC);
The above code has several polynomials with as arguments empty squares (e), white kings (w) and black kings (b). (For the mathematically interested, what is being written here is the Cycle Index Polynomial of the symmetry group of the 10x10 checkers board. I learned this trick from the PhD thesis of Ralph Gasser, who solved the game Nine Men's Morris, see http://citeseer.ist.psu.edu/cache/paper ... essing.pdf).

Expanding the polynomial DT (or DTred which is a rewriting of DT) by just typing "DT;" gives a lot of terms, among which 635 * e^48 * b * w, which means that there are 635 inequivalent endgames on a 10x10 board with one king for each side. Simarly for all the other terms.

Typing "DA;" gives the total number of endgames without taking care of symmetries. It has e.g. the term 2450 * e^48 * b * w, which means that there are 2450 one king vs one king endgames on a 10x10 board.

Typing "DS;", "DD;" or "DR;" gives again a whole bunch of terms, each of which gives the number of endgames that are symmetric under flips along the single corner diagonal (DS, which has e.g. coefficient 90 for e^48*b*w which means there are 90 ways of placing a black and a white king on the long diagonal), the double corner diagonal (DD) or under 180 degree rotations (DR). Typing "DC;" gives the number of endgames which are symmetric under all three of the above symmetries simultaneously.

The polynomials DSred, DDred, DRred give the number of inequivalent ("canonical") positions that are symmetric under one of the three symmetries. DAred gives the number of endgames which are not symmetric under any of the three symmetries (590 for 1 king vs 1 king).

Hi,

All these formulas are not very easy to understand but I have simple question.
Do we need to take into account the 3 symmetries DS, DD and DR or is it sufficient to take into account only 2 symmetries?

As an example let’s take positions with 2 white kings vs 1 black king, and let’s take DS and DD symmetries.
For DS let’s suppose that we keep only positions where black king is in the region 1-5-46-6
For DD let’s suppose that we keep only positions where black king is in the triangle 1-5-45

This position is kept in the db

This position is not kept in the db due to DS symmetry

This position is not kept in the db due to DD symmetry

This position is not kept in the db due to DS or DD symmetry

Isn’t it sufficient to take into account only DS and DD ?

Gérard

Rein Halbersma · Post by **Rein Halbersma** » Sun Jun 24, 2007 11:51

TAILLE wrote:
Hi,

All these formulas are not very easy to understand but I have simple question.
Do we need to take into account the 3 symmetries DS, DD and DR or is it sufficient to take into account only 2 symmetries?

As an example let’s take positions with 2 white kings vs 1 black king, and let’s take DS and DD symmetries.
For DS let’s suppose that we keep only positions where black king is in the region 1-5-46-6
For DD let’s suppose that we keep only positions where black king is in the triangle 1-5-45

This position is kept in the db

This position is not kept in the db due to DS or DD symmetry

Actually, your last position (P4) is related to the first by a 180 degree rotation (DR symmetry). If you have the first position (P1) in the db, then only testing P1 = DR * P4 would let you drop P4.

Of course, you are right in the fact that DR = DS * DD = DD * DS (combination of two flips - in any order- gives a rotation). But it also works the other way around: a flip followed by a rotation gives the other flip: DS = DR * DD and DD = DR * DS.

So, if a position is symmetric under DS and DD, then it is symmetric under DR. If you use all 3 symmetries, then you would only keep positions with the first king in the triangle 1-5-23. But you have to be careful if the black king is on the long diagonal and even more careful if all kings are on the long diagonal.

With more kings it gets even more complicated. Take e.g. 2vs2 kings:

This position is not symmetric under DS or DD, but it is symmetric under DR. And a DS or DD flip on the above position give the same position below

TAILLE · Post by **TAILLE** » Sun Jun 24, 2007 12:15

Rein Halbersma wrote: For 2 vs 1, yes, but not in general. Take e.g. 2vs2 kings:

[img]http://fmjd.org/dias2/save/11826778258.png[/img]

This position is not symmetric under DS or DD, but it is symmetric under DR. And a DS or DD flip on the above position give the same position below

[img]http://fmjd.org/dias2/save/11826780477.png[/img]

What is your point with your two above diagrams ?
For Damy the first position cannot be in the db because there is simply no black king in the triangle 1-5-23.
For your information if the position of black kings is symmetrical regarding DS then I look at white kings to take into account DS symmetry.
Gérard

BertTuyt · Post by **BertTuyt** » Sun Jun 24, 2007 12:56

One of the things I thought about but never implemented is to store in a database only the positions which theoretically can occur during a game.
In the initial position 20man x 20man there is a symmetry (equal tempo when both players have made one or more moves), which disables many positions. You can place 20 white man and 20 black man on the board in endless variations, but a huge fraction can never be reached from a thoretical point of view (even more most likely from a practical point of view).
The problem is that it is hard , or very difficult to determine this actual number.

Also when you exchange pieces as the game progresses, the initial symmetry is broken. As a result for example all 1m x 1m positions are possible ( i tested this with a huge number of games with pseudo random moves).

But if all 5m x 5m are all possible i have no clue. If the actual number (compared with the theoretical maximum) is a factor of 100 less, and you store only the non-draw positions (in a format like Gerard), it could be feasible.

I once tried to estimate the number of 5m x 5m positions.
What I did is playing a huge number of pseudo-random games and i stored all 5m x 5m positions found.
The i played another series of random games and also stored all these positions.

My theory was that the estimation for practical 5m x 5m positons P could be derived from P = N * N / Overlap.
N is the number of games played. The overlap is the number of positions found in both series. So if you play 1M games and 100 positions are equal then the estimate for the total size is 10^10.

One can see that if there re only 100 positions in a set and you play 100 games then the prediction is right P = 100 * 100 / 100.

However when i increased the number of games the estimation did not seem to converge, most likely that the probability of positions encountered are not all the same.

Anyway maybe an interesting idea to test further.
You could use selfplay to generate the 5m x 5m positions and then based on partial information databases generate more endings.

To be continued .....

Bert

TAILLE · Post by **TAILLE** » Sun Jun 24, 2007 14:13

BertTuyt wrote:One of the things I thought about but never implemented is to store in a database only the positions which theoretically can occur during a game.
In the initial position 20man x 20man there is a symmetry (equal tempo when both players have made one or more moves), which disables many positions. You can place 20 white man and 20 black man on the board in endless variations, but a huge fraction can never be reached from a thoretical point of view (even more most likely from a practical point of view).
The problem is that it is hard , or very difficult to determine this actual number.

Also when you exchange pieces as the game progresses, the initial symmetry is broken. As a result for example all 1m x 1m positions are possible ( i tested this with a huge number of games with pseudo random moves).

But if all 5m x 5m are all possible i have no clue. If the actual number (compared with the theoretical maximum) is a factor of 100 less, and you store only the non-draw positions (in a format like Gerard), it could be feasible.

I once tried to estimate the number of 5m x 5m positions.
What I did is playing a huge number of pseudo-random games and i stored all 5m x 5m positions found.
The i played another series of random games and also stored all these positions.

My theory was that the estimation for practical 5m x 5m positons P could be derived from P = N * N / Overlap.
N is the number of games played. The overlap is the number of positions found in both series. So if you play 1M games and 100 positions are equal then the estimate for the total size is 10^10.

One can see that if there re only 100 positions in a set and you play 100 games then the prediction is right P = 100 * 100 / 100.

However when i increased the number of games the estimation did not seem to converge, most likely that the probability of positions encountered are not all the same.

Anyway maybe an interesting idea to test further.
You could use selfplay to generate the 5m x 5m positions and then based on partial information databases generate more endings.

To be continued .....

Bert

Hi Bert,
The question of whether or not a given position may be reach is very hard. The key point for me is to build a db as useful as possible. My db gives only partial information and I will continue in this way as Ed do.
There is another interesting idea not yet expressed. The db used to continue to generate the db itself may be different from the db used in a actual game (the "live db")
You can decide for example to generate the complete (not partial) 7 pieces endgame db but, for actual game you take almost no risk to eliminate from the db the positions like 4K vs 3K, 3K1M vs 3K, 2K2M vs 3K ... 4K vs 3M. Do you think also that it is very useful to keep in the "live db" the special winning positions at 3K vs 3K or even 3K vs 2K1M ?
The point is that these positions will probably never happened during a game so it may be a good idea to save memory to other positions !
It is an approach different from Ed one because the other part of the db is really complete !
Gérard

Ed Gilbert · Post by **Ed Gilbert** » Sun Jun 24, 2007 15:23

TAILLE wrote: There is another interesting idea not yet expressed. The db used to continue to generate the db itself may be different from the db used in a actual game (the "live db")

Gerard, this is definitely true with my builder. My indexing subdivides the database up into hundreds of thousands of files. These take anywhere from a few seconds to a few hours to build. When I am done building a database, I re-index it to a format which is more suitable for lookups from an engine search. The point of this is that by subdividing into many smaller files, the database can be built with less memory, and many of the subdivisions can be built in parallel.

TAILLE wrote:You can decide for example to generate the complete (not partial) 7 pieces endgame db but, for actual game you take almost no risk to eliminate from the db the positions like 4K vs 3K, 3K1M vs 3K, 2K2M vs 3K ... 4K vs 3M. Do you think also that it is very useful to keep in the "live db" the special winning positions at 3K vs 3K or even 3K vs 2K1M ?
The point is that these positions will probably never happened during a game so it may be a good idea to save memory to other positions !

For me these rarely used positions only take up disk space, so why eliminate them? If you should happen to hit these during a game, then they are there to be brought into cache and be used. If they are not hit during a game, they're never brought into cache, so there is no penalty for having them available on disk.

-- Ed

FeikeBoomstra · Post by **FeikeBoomstra** » Sun Jun 24, 2007 19:21

Can't you extend the idea of Gerard to the database generation. If I understood well the idea of retrograde generation, you generate from level n+1 -> level n.
Proposed algorithm: you take only the "rare" positions in level n+1. Generate all possible parents. That is your (reduced set of) candidates in level n. Now generate for all candidates all children and check whether they are solved in level >=n+1. If this is true for all possible children then the candidate is approved, otherwise discarted. So now you have only the 'rare" candidates in level n and you can continue with level n-1.

BertTuyt · Post by **BertTuyt** » Mon Jun 25, 2007 13:12

I have implemented in Damage a database Learn Option, which is basically never used.
The trick is to collect all endgame positions which are solved by tree-search and databases. I store them in a separate file.
I made such a file for 5mx5m positions.
The idea (but never tested this) is then to inject these positions in a partial information database and then generate other positions from there.
Maybe interesting to try it this summer vacation.
Greetings from China by the way!

Bert

Ed Gilbert · Post by **Ed Gilbert** » Mon Jun 25, 2007 13:45

TAILLE wrote:Hi Ed,
Maybe one approach to compare the solutions would be to compare the time for generating the 4K+1M vs 2K db. To buid this db in need effectively to do a lot of lookups in 5K vs 2F db and in also when reading the 6 pieces db in case of capture.
As I already mentionned, to buid the 4K+1M vs 2K db I need almost 2 weeks on one processor. What is your corresponding figure ?
Gérard

Hi Gerard,

The 4K+1M vs 2K slice took 24.7 hours to build. The build program continued running and completed the 4K+1M vs 1K+1M slice in 26.8 hours, and the 4K+1M vs 2M slice in 6.8 hours. It also has a good start on the 2K+2M vs 3K slice which was next on its build list. Here are the entries from the program's performance log:

88925.6s: db1402 (88924.3s), 212294 pos/sec
185318.4s: db1411 (96392.8s), 359717 pos/sec
209969.9s: db1420 (24650.7s), 644700 pos/sec

If I scale the 24.7 hours by the ratio of our computer clocks, that pushes it up to 1.37 days, which is an order of magnitude faster than your 14 days.

This seems to support what I wrote earlier regarding db sizes and lookup speeds. What is most important is how fast you can do a position lookup. It is likely that your build speed is hampered by having to do multiple database probes in order to get the WLD value of a single position.

-- Ed

TAILLE · Post by **TAILLE** » Mon Jun 25, 2007 15:25

Ed Gilbert wrote:
TAILLE wrote:Hi Ed,
Maybe one approach to compare the solutions would be to compare the time for generating the 4K+1M vs 2K db. To buid this db in need effectively to do a lot of lookups in 5K vs 2F db and in also when reading the 6 pieces db in case of capture.
As I already mentionned, to buid the 4K+1M vs 2K db I need almost 2 weeks on one processor. What is your corresponding figure ?
Gérard
Hi Gerard,

The 4K+1M vs 2K slice took 24.7 hours to build. The build program continued running and completed the 4K+1M vs 1K+1M slice in 26.8 hours, and the 4K+1M vs 2M slice in 6.8 hours. It also has a good start on the 2K+2M vs 3K slice which was next on its build list. Here are the entries from the program's performance log:

88925.6s: db1402 (88924.3s), 212294 pos/sec
185318.4s: db1411 (96392.8s), 359717 pos/sec
209969.9s: db1420 (24650.7s), 644700 pos/sec

If I scale the 24.7 hours by the ratio of our computer clocks, that pushes it up to 1.37 days, which is an order of magnitude faster than your 14 days.

This seems to support what I wrote earlier regarding db sizes and lookup speeds. What is most important is how fast you can do a position lookup. It is likely that your build speed is hampered by having to do multiple database probes in order to get the WLD value of a single position.

-- Ed

Hi Ed,
Thank you for this information. I have to recognize that you are right and I will change my mind to be able to generate the 7 pieces endgame db.
I am just building a very new generation algorithm with a lot of improvement to gain in efficiency.

Just a question concerning the generation process.
Let's take for example positions with 5 white kings against 2 black kings and let's take a position where it is white to move. When there are no capture I can see some modification of my movegenerator to avoid generating all successors when, by chance, I find a winning position in the first generated positions. That sound good because it costs nothing (the work is basically done in another order) and with 5 kings there are a lot of successors !
I did not manage to have the same approach when there is a capture because it is only at the end of my move generator process that I can resolve the multiple capture issue.
Had you the same approach ?
Gérard

Ed Gilbert · Post by **Ed Gilbert** » Mon Jun 25, 2007 22:37

TAILLE wrote:Just a question concerning the generation process.
Let's take for example positions with 5 white kings against 2 black kings and let's take a position where it is white to move. When there are no capture I can see some modification of my movegenerator to avoid generating all successors when, by chance, I find a winning position in the first generated positions. That sound good because it costs nothing (the work is basically done in another order) and with 5 kings there are a lot of successors !
I did not manage to have the same approach when there is a capture because it is only at the end of my move generator process that I can resolve the multiple capture issue.
Had you the same approach ?
Gérard

No, I don't do that. I generate the list of all the moves and then work through them sequentially in the db builder. If you can do it incrementally then this might save some time. My move generator is not especially fast. It is a straightforward conversion from my 8x8 program, and the bitmaps are neither rotated nor with extra 'ghost' gaps, so I have to do the even and odd rows separately.

-- Ed

World Draughts Forum

EndGame Databases

Benchmark results

Re: Benchmark results

Re: Benchmark results