EndGame Databases

Ed Gilbert · Post by **Ed Gilbert** » Mon Jun 18, 2007 03:49

Hi Bert,

The endgame databases are indeed fun to play with, and they are quite useful for the type of late game analysis that we were doing in the classical positions thread. But my feeling is that they are of secondary importance during a tournament game, because I think most games are won or lost in the opening or midgame stages, before the endgame databases can have a big effect. Of course I am referring to 10x10 here, for in 8x8 checkers they have an enormous impact. I would be surprised if my program can compete with the best 10x10 programs like yours and Gerard's, even with these databases, because I understand very little about the strategy of this variation of checkers, having never played it, and I'm sure my position evaluation is lacking in some things. But it would be interesting to attend one of these tournaments and nice also to meet you and the other programmers, so I will think about doing this.

I'll try to answer your specific questions about the databases. What I have completed so far are full databases for 2 through 7 pieces with up to 5 pieces on a side (no 6x1, but all the rest). These are complete databases and I also have DTC dbs for these positions. For 8 pieces I have computed 'incomplete' dbs for all the 4x4 and 5x3 positions with up to 1 king on each side. I did not try to create any 8-piece DTC dbs because they are not needed for this subset of positions. For 9 pieces I have computed incomplete dbs for the 5x4 positions with all checkers (no kings).

You asked about sizes and time to compute. For 2 - 6 pieces, the size is 1.31gb. For 2 - 7 pieces it is 30.2gb. The 8-piece slices by themselves are 95.8gb. The 9-piece slice by itself is 21.2gb. If these sound large, my feeling is that its simply disk space, and disks are cheap. This data is quite effective during a search on a PC with 2gb of RAM.

It took 5 or 6 months to compute these databases. Most of it was done using 4 PCs. The fastest of these is a 3-year old P4 with 3gb RAM, and the slowest is circa 2002 and using RDRAM (remember that stuff?). That one can only do about 60% of the work that the others do in the same time. The 9-piece all checkers slice took about 3 weeks. I am presently computing the 9-piece slices with a maximum of 1 king. This will take about 4 months on these machines, and then after that I will recompute the all checkers slice because I'll be able to resolve more of the positions.

I'm certain that these 4 machines can be replaced by a single core 2 duo, because I just recently bought one, and I have benchmarked the db building on it. I tested running 2 instances of the build program, and each instance runs twice as fast as the fastest of my 4 older machines. And that is running in 32-bit mode. The new machine is running XP64 and I can recompile the build program to run in 64-bit mode to get more speed. I haven't done this yet, but I did recompile Kingsrow and the search speed increased by 50%, so I expect to see a similar increase for the db builder. With the price of quad cores dropping in July, it makes sense to get one of these if you really want to make quick work of building databases.

See the link that Rein gave for Schaeffer's paper on building incomplete databases. It's an excellent paper and has enough detail so that you can exactly reproduce his state machine for resolving the values. If you have other questions then feel free to email me or post here and I'll try to answer them.

-- Ed

TAILLE · Post by **TAILLE** » Mon Jun 18, 2007 12:50

Ed Gilbert wrote:Hi Bert,

The endgame databases are indeed fun to play with, and they are quite useful for the type of late game analysis that we were doing in the classical positions thread. But my feeling is that they are of secondary importance during a tournament game, because I think most games are won or lost in the opening or midgame stages, before the endgame databases can have a big effect. Of course I am referring to 10x10 here, for in 8x8 checkers they have an enormous impact. I would be surprised if my program can compete with the best 10x10 programs like yours and Gerard's, even with these databases, because I understand very little about the strategy of this variation of checkers, having never played it, and I'm sure my position evaluation is lacking in some things. But it would be interesting to attend one of these tournaments and nice also to meet you and the other programmers, so I will think about doing this.

I'll try to answer your specific questions about the databases. What I have completed so far are full databases for 2 through 7 pieces with up to 5 pieces on a side (no 6x1, but all the rest). These are complete databases and I also have DTC dbs for these positions. For 8 pieces I have computed 'incomplete' dbs for all the 4x4 and 5x3 positions with up to 1 king on each side. I did not try to create any 8-piece DTC dbs because they are not needed for this subset of positions. For 9 pieces I have computed incomplete dbs for the 5x4 positions with all checkers (no kings).

You asked about sizes and time to compute. For 2 - 6 pieces, the size is 1.31gb. For 2 - 7 pieces it is 30.2gb. The 8-piece slices by themselves are 95.8gb. The 9-piece slice by itself is 21.2gb. If these sound large, my feeling is that its simply disk space, and disks are cheap. This data is quite effective during a search on a PC with 2gb of RAM.

It took 5 or 6 months to compute these databases. Most of it was done using 4 PCs. The fastest of these is a 3-year old P4 with 3gb RAM, and the slowest is circa 2002 and using RDRAM (remember that stuff?). That one can only do about 60% of the work that the others do in the same time. The 9-piece all checkers slice took about 3 weeks. I am presently computing the 9-piece slices with a maximum of 1 king. This will take about 4 months on these machines, and then after that I will recompute the all checkers slice because I'll be able to resolve more of the positions.

I'm certain that these 4 machines can be replaced by a single core 2 duo, because I just recently bought one, and I have benchmarked the db building on it. I tested running 2 instances of the build program, and each instance runs twice as fast as the fastest of my 4 older machines. And that is running in 32-bit mode. The new machine is running XP64 and I can recompile the build program to run in 64-bit mode to get more speed. I haven't done this yet, but I did recompile Kingsrow and the search speed increased by 50%, so I expect to see a similar increase for the db builder. With the price of quad cores dropping in July, it makes sense to get one of these if you really want to make quick work of building databases.

See the link that Rein gave for Schaeffer's paper on building incomplete databases. It's an excellent paper and has enough detail so that you can exactly reproduce his state machine for resolving the values. If you have other questions then feel free to email me or post here and I'll try to answer them.

-- Ed

Hi Ed, Bert,

Of course in some situations, powerful endgame db may be very very useful. It'is partiuclarly true when the program is facing a human body. Against another computer with "only" the 6 pieces database, it could obviously be helpful but it is true that in the majority of the games the decision will be made in the middle of the game. Taking the last french championship in Hamel all the games played by Damy were effectively decided in the middle of the game, in situations where there was still more than 20 pieces on the board.

Coming back to the endgame databases I began to generate the 7 pieces db. For the time being I generated the following tables :
5 kings against 2 kings
4 kings and 1 man against 2 kings
4 kings against 3 kings
4 kings against 2 kings and 1 man
4 kings against 1 king and 2 men
3 kings and 1 man against 3 kings

The size of the 6 pieces db is 165 Mb, the size of the above 7 pieces tables is 180 Mb, all that without any compression !

For your information : as I explained before, when there are less than 4 kings on the board I do not need to generate the DTC db. As a consequence, I have two different algorithms to generate the db, dependant on wether or not I need to generate at the same time the DTC db.
That way, I suspect I save some time but I did not measure the corresponding gain.

Gérard

BertTuyt · Post by **BertTuyt** » Mon Jun 18, 2007 18:17

Gerard, how long did your calculations take on what hardware?
I want to compare it with the results of Ed.
And also this will guide me what hardware i need to buy for the summer holiday, any advice is welcomed (processor, operating system, memory and Hard Disk size).

Bert

Ed Gilbert · Post by **Ed Gilbert** » Tue Jun 19, 2007 01:01

TAILLE wrote:5 kings against 2 kings
4 kings and 1 man against 2 kings
4 kings against 3 kings
4 kings against 2 kings and 1 man
4 kings against 1 king and 2 men
3 kings and 1 man against 3 kings

The size of the 6 pieces db is 165 Mb, the size of the above 7 pieces tables is 180 Mb, all that without any compression !

Gerard, when you say without any compression, I expect that you are still using runlength encoding. Is that correct? Also, do you have any estimates on the average number of database position lookups you have to do to in order to learn the WLD value of a typical endgame position?

-- Ed

TAILLE · Post by **TAILLE** » Tue Jun 19, 2007 15:13

Ed Gilbert wrote:
TAILLE wrote:5 kings against 2 kings
4 kings and 1 man against 2 kings
4 kings against 3 kings
4 kings against 2 kings and 1 man
4 kings against 1 king and 2 men
3 kings and 1 man against 3 kings

The size of the 6 pieces db is 165 Mb, the size of the above 7 pieces tables is 180 Mb, all that without any compression !
Gerard, when you say without any compression, I expect that you are still using runlength encoding. Is that correct? Also, do you have any estimates on the average number of database position lookups you have to do to in order to learn the WLD value of a typical endgame position?

-- Ed

Sorry for my bad english. What is runlength encoding ?
During the generation process itself I use a very simple format wtih 4 bits per position (plus 8 bits if I need DTC information). Then, all the results (the relevant positions) are stored in the database in a special format (fabric secret) to save a lot of memory, and I do not use any other format.
To answer your question let me take two simple exemples.
Positions with 5 white kings against 2 black kings : I store in the data base only the positions where it is white to play and where white is unable to win. As a consequence, no lookup if it is white to move and of course an important lookup if it black to move.
Positions with 4 kings against 2 kings and a man : I store only positions where the color on the move as a losing position. No lookup if we are in a loosing position and an important look up otherwise.
In the two above cases you can easily understand why my db is noy very large.
Gérard

TAILLE · Post by **TAILLE** » Tue Jun 19, 2007 15:28

BertTuyt wrote:Gerard, how long did your calculations take on what hardware?
I want to compare it with the results of Ed.
And also this will guide me what hardware i need to buy for the summer holiday, any advice is welcomed (processor, operating system, memory and Hard Disk size).

Bert

Hi Bert,
Because I have not generated the complete 7 pieces db I cannot give you the figures corresponding to Ed ones.
Anyway, as an exemple I need almost 2 weeks to generate 4 kings against 2 kings and a man db and it is the same figure for generating 4 kings against 1 king and 2 men.
Of course I know perfectly that it will take far more time to generate the 2 kings and 2 men against 1 king and 2 men db !
My configuration is the following :
I have a laptop (the one I bring in Hamel for the french open championship), the processeur is an Intel centrino duo, I have 2gb RAM.
For generation purpose I use only one processor because I have only one computer. I use the second processor for developping purpose to improve the Damy engine or the evaluation function.
Gérard

BertTuyt · Post by **BertTuyt** » Wed Jun 20, 2007 22:30

Gerard, your database sized still puzzle me , although the limiteds size you need to put information is is remarkable (and as you stated is even not based on compression).
In your example you stated that the 7piece databases you already generated were 180MegaByte.
To really understand your claims, what is the actual size for the 5k x 2K Database , only 10M maybe ?

Bert

TAILLE · Post by **TAILLE** » Wed Jun 20, 2007 23:39

BertTuyt wrote:Gerard, your database sized still puzzle me , although the limiteds size you need to put information is is remarkable (and as you stated is even not based on compression).
In your example you stated that the 7piece databases you already generated were 180MegaByte.
To really understand your claims, what is the actual size for the 5k x 2K Database , only 10M maybe ?

Bert

The size of my 5Kx2K db is exactly 2 220 653 bytes. Supposing white has 5 kings (and black 2 kings) I stored in the db only the position where it is white to move and where white cannot win. In addition, I did not store in the db the position where white cannot avoid that black will take a piece with its next move. That way it is easy to imagine that it remains only few positions.
Gérard

Ed Gilbert · Post by **Ed Gilbert** » Wed Jun 20, 2007 23:55

BertTuyt wrote:Gerard, your database sized still puzzle me , although the limiteds size you need to put information is is remarkable (and as you stated is even not based on compression).
In your example you stated that the 7piece databases you already generated were 180MegaByte.
To really understand your claims, what is the actual size for the 5k x 2K Database , only 10M maybe ?

Bert

The 5k vs 2k slice is 4,195,144,800 positions, counting both sides to move. If you store all the WLD info and don't do any compression other than bit packing, and pack 5 values into each byte, the size becomes 839mb. Gerard is not allowing for 3 values in each position, only 2 values, so straight bit packing of 8 values into a byte makes the size 524mb. Obviously Gerard is doing some form of compression. If you only store 1 bit per position in the 5k vs 2k slice, and that bit tells you if the position is a win or not a win, then you will end up with very long strings of consecutive positions that are the same bit value. Runlength encoding compresses this kind of data very well, and is also very quick to uncompress when you want to do lookups. The sizes of Gerards slices are consistent with excluding some of the full WLD data and compressing with runlength encoding.

Small sizes are nice if you're trying to save disk space, but primarily what matters is how fast you can do a lookup of a database position during a search. Gerard's scheme differs fundamentally from the way lookups work in Chinook, Kingsrow, and Cake, in that Gerard will often have to perform many lookups in his database to resolve the value of a single position, where these other programs only have to do one lookup. Offsetting the many lookups are the fact that his data takes less space, and therefore his cache hit rate is higher for a given cache size. In order to know which scheme is faster for lookups, you need to know how long it takes to do a lookup of a position in cache, of a position that is not in cache, what the cache hit percentage is, and how many lookups Gerard has to do on average to resolve a single position to a WLD value. You would have to benchmark this for some particular db size, cache size, etc., to see what the relative speeds are.

Certainly a lookup with a cache hit is measured in microseconds, or less, while a cache miss is measured in milliseconds.

You can also see which way is better if you take the two extremes in cache sizes. If your cache memory is large enough to cache the entire db (e.g. 8-piece db for 8x8 checkers and 4gb ram, or 6-piece db in 10x10 and 1.3gb ram), then obviously you don't want to do what Gerard is doing because his many lookups will be slower than the others' single lookup to get a WLD value for a position. At the other extreme, if your database is so large and your cache is so small that just about every lookup is a cache miss, then you don't want to do what Gerard is doing either. There may be some scenario inbetween these two extremes where Gerard's scheme is faster, but without doing some actual benchmarks I don't think its possible for us to know.

-- Ed

TAILLE · Post by **TAILLE** » Thu Jun 21, 2007 00:39

Ed Gilbert wrote:
BertTuyt wrote:Gerard, your database sized still puzzle me , although the limiteds size you need to put information is is remarkable (and as you stated is even not based on compression).
In your example you stated that the 7piece databases you already generated were 180MegaByte.
To really understand your claims, what is the actual size for the 5k x 2K Database , only 10M maybe ?

Bert
The 5k vs 2k slice is 4,195,144,800 positions, counting both sides to move. If you store all the WLD info and don't do any compression other than bit packing, and pack 5 values into each byte, the size becomes 839mb. Gerard is not allowing for 3 values in each position, only 2 values, so straight bit packing of 8 values into a byte makes the size 524mb. Obviously Gerard is doing some form of compression. If you only store 1 bit per position in the 5k vs 2k slice, and that bit tells you if the position is a win or not a win, then you will end up with very long strings of consecutive positions that are the same bit value. Runlength encoding compresses this kind of data very well, and is also very quick to uncompress when you want to do lookups. The sizes of Gerards slices are consistent with excluding some of the full WLD data and compressing with runlength encoding.

Small sizes are nice if you're trying to save disk space, but primarily what matters is how fast you can do a lookup of a database position during a search. Gerard's scheme differs fundamentally from the way lookups work in Chinook, Kingsrow, and Cake, in that Gerard will often have to perform many lookups in his database to resolve the value of a single position, where these other programs only have to do one lookup. Offsetting the many lookups are the fact that his data takes less space, and therefore his cache hit rate is higher for a given cache size. In order to know which scheme is faster for lookups, you need to know how long it takes to do a lookup of a position in cache, of a position that is not in cache, what the cache hit percentage is, and how many lookups Gerard has to do on average to resolve a single position to a WLD value. You would have to benchmark this for some particular db size, cache size, etc., to see what the relative speeds are.

Certainly a lookup with a cache hit is measured in microseconds, or less, while a cache miss is measured in milliseconds.

You can also see which way is better if you take the two extremes in cache sizes. If your cache memory is large enough to cache the entire db (e.g. 8-piece db for 8x8 checkers and 4gb ram, or 6-piece db in 10x10 and 1.3gb ram), then obviously you don't want to do what Gerard is doing because his many lookups will be slower than the others' single lookup to get a WLD value for a position. At the other extreme, if your database is so large and your cache is so small that just about every lookup is a cache miss, then you don't want to do what Gerard is doing either. There may be some scenario inbetween these two extremes where Gerard's scheme is faster, but without doing some actual benchmarks I don't think its possible for us to know.

-- Ed

Hi Ed,
How do you obtain the figure 4,195,144,800 for the number of position with 5K vs. 2K ? My count is 1,419,569,200. Don't you use the symetries of the board ?

No Ed, for the time being I do not use a runlength encoding. Maybe my explanations were not very clear.
Let's take the trivial final 1K vs 1K. The number of possible position is about 15 * 50 = 750 positions (taking into account the symetries of the bord. Supposing it is white to move it exists only one losing position for white i.e WB:46 vs BK:05 (WB:05 vs BK:46 is the same position because of symetries). It is clear that, without any compression, it is very easy to store in a db only one position and avoiding to store 750 bits !
Gérard

Ed Gilbert · Post by **Ed Gilbert** » Thu Jun 21, 2007 01:34

TAILLE wrote:How do you obtain the figure 4,195,144,800 for the number of position with 5K vs. 2K ? My count is 1,419,569,200. Don't you use the symetries of the board ?

I used symmetry to compute the 5k vs 5k slice for English checkers, because that allowed me to fit it in 2gb RAM. Other than that, I don't bother with symmetry. It only applies to slices that have all kings, and this represents only a small percentage of the total positions that have to be computed. For me it is simpler not to trouble with it, because the gains are small.

TAILLE wrote:No Ed, for the time being I do not use a runlength encoding. Maybe my explanations were not very clear.
Let's take the trivial final 1K vs 1K. The number of possible position is about 15 * 50 = 750 positions (taking into account the symetries of the bord. Supposing it is white to move it exists only one losing position for white i.e WB:46 vs BK:05 (WB:05 vs BK:46 is the same position because of symetries). It is clear that, without any compression, it is very easy to store in a db only one position and avoiding to store 750 bits !
Gérard

I guess it is a question of how you define compression. My definition is that whenever you make a data set smaller by removing rendundant information, that is compression. Obviously there is more than one way to reduce the size of the data set to the sizes that you have, and it doesn't matter exactly how you did it. What is more interesting to me is how many lookups you have to do on average to resolve a single position to a WLD value. I can see that you may not have to do very many for lopsided positions that are mostly wins, but for more even positions I expect higher. Also I don't know if your technique can work when the data is incomplete, i.e. when a position can have 6 possible values instead of only 3. Have you thought about this?

-- Ed

Rein Halbersma · Post by **Rein Halbersma** » Thu Jun 21, 2007 10:35

TAILLE wrote: How do you obtain the figure 4,195,144,800 for the number of position with 5K vs. 2K ? My count is 1,419,569,200. Don't you use the symetries of the board ?

How do you arrive at your number of positions 1,419,569,200? When I use all symmetries I get 1,048,889,280 positions (both sides to move).

TAILLE · Post by **TAILLE** » Thu Jun 21, 2007 12:41

Rein Halbersma wrote:
TAILLE wrote: How do you obtain the figure 4,195,144,800 for the number of position with 5K vs. 2K ? My count is 1,419,569,200. Don't you use the symetries of the board ?
How do you arrive at your number of positions 1,419,569,200? When I use all symmetries I get 1,048,889,280 positions (both sides to move).

Hi Rein,
You are certainly correct because I did not use all the symmetries because I had to find somme compromise to avoid to many calculation for finding the index of a position.
The 1,419,569,200 number is obtained with the following calculation :
1) Taking into account symmetries it exists 335 possibilities to put 2 black kings on the board
2) Then, without taking any symmetry nor taking into account some impossibilities, you can count 50*49*48*47*46/5/4/3/2 = 2,118,460 combinaisons for putting 5 white kings on the board.
By taking into account that both side can have the move the total number is then 335 * 2,118,460 * 2 = 1,419,569,200.
What is your approach of your calculation ?
Gérard

TAILLE · Post by **TAILLE** » Thu Jun 21, 2007 12:50

Ed Gilbert wrote:
TAILLE wrote:How do you obtain the figure 4,195,144,800 for the number of position with 5K vs. 2K ? My count is 1,419,569,200. Don't you use the symetries of the board ?
I used symmetry to compute the 5k vs 5k slice for English checkers, because that allowed me to fit it in 2gb RAM. Other than that, I don't bother with symmetry. It only applies to slices that have all kings, and this represents only a small percentage of the total positions that have to be computed. For me it is simpler not to trouble with it, because the gains are small.

TAILLE wrote:No Ed, for the time being I do not use a runlength encoding. Maybe my explanations were not very clear.
Let's take the trivial final 1K vs 1K. The number of possible position is about 15 * 50 = 750 positions (taking into account the symetries of the bord. Supposing it is white to move it exists only one losing position for white i.e WB:46 vs BK:05 (WB:05 vs BK:46 is the same position because of symetries). It is clear that, without any compression, it is very easy to store in a db only one position and avoiding to store 750 bits !
Gérard
I guess it is a question of how you define compression. My definition is that whenever you make a data set smaller by removing rendundant information, that is compression. Obviously there is more than one way to reduce the size of the data set to the sizes that you have, and it doesn't matter exactly how you did it. What is more interesting to me is how many lookups you have to do on average to resolve a single position to a WLD value. I can see that you may not have to do very many for lopsided positions that are mostly wins, but for more even positions I expect higher. Also I don't know if your technique can work when the data is incomplete, i.e. when a position can have 6 possible values instead of only 3. Have you thought about this?

-- Ed

Hi Ed,
Maybe one approach to compare the solutions would be to compare the time for generating the 4K+1M vs 2K db. To buid this db in need effectively to do a lot of lookups in 5K vs 2F db and in also when reading the 6 pieces db in case of capture.
As I already mentionned, to buid the 4K+1M vs 2K db I need almost 2 weeks on one processor. What is your corresponding figure ?
Gérard

Ed Gilbert · Post by **Ed Gilbert** » Thu Jun 21, 2007 13:50

TAILLE wrote: Hi Ed,
Maybe one approach to compare the solutions would be to compare the time for generating the 4K+1M vs 2K db. To buid this db in need effectively to do a lot of lookups in 5K vs 2F db and in also when reading the 6 pieces db in case of capture.
As I already mentionned, to buid the 4K+1M vs 2K db I need almost 2 weeks on one processor. What is your corresponding figure ?
Gérard

Hi Gerard,

I can use my workstation at work to run this slice over the weekend and give you an exact answer next week. It has been a while since I ran this so I don't remember how long it took. But my prediction is that if I start this on Friday evening when I leave the office, it will be finished on Monday.

What is the clock speed your processor? I will run this on a single thread of a 2.67GHz core 2 duo running 32-bit XP.

-- Ed