EndGame Databases

Discussion about development of draughts in the time of computer and Internet.
Post Reply
Rein Halbersma
Posts: 1722
Joined: Wed Apr 14, 2004 16:04
Contact:

Post by Rein Halbersma » Sat Jun 09, 2007 10:40

Ed Gilbert wrote:Hi Bert,

Where did you hear the news about Schaeffer finishing his solutions? I have not seen it yet.

I have been building endgame databases for 10x10 draughts since last year. I have completed all the 2 through 7 piece positions with up to 5 pieces on a side, and I have also built some partial 8-piece and 9-piece subsets. By partial, I mean that I start building slices that are mostly men without having the more king-heavy slices that some of the positions depend on. By skipping the king-heavy slices you can build the most useful ones in much less time. Of course there are some positions that cannot be resolved this way, but it turns out that a very high percentage can be. I have finished all the 4x4 and 5x3 positions with up to one king on each side, and in a day or two I will be done with the 5x4 all checkers slice. I'm not at home at the moment, but if I can remember the sizes correctly, the 2 - 7 pieces is about 30gb, the 8-piece slices are 85gb, and the 9 piece slice is about 11gb. These were all built using 4 rather slow machines that I built in 2004. They could be replaced by one fast dual-core machine.

-- Ed (author of Kingsrow)
Hi Ed, nice to hear that the checkers community is interested in competing in the 10x10 arena! I guess now that checkers is almost solved that `moving up a weight class' (no disrespect, there is always 12x12 canadian draughts!) is the logical next thing.

Will you also make the 7 & 8 piece databases available to the public, like you do with the 10 piece checkers databases?

Ed Gilbert
Posts: 859
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Post by Ed Gilbert » Sat Jun 09, 2007 14:09

Rein, distributing large databases has been a lot more work for me than I anticipated. I would like to reduce the time I spend doing this, so I do not intend to make them available.

> I guess now that checkers is almost solved that `moving up a weight
> class' (no disrespect, there is always 12x12 canadian draughts!) is the
> logical next thing.

That Schaeffer has (almost) solved checkers will have little if any affect on human play or on checkers programs. One reason is that he has solved only the GAYP game, but not the 3-move game. His proof of the GAYP game involves solving only ~20 of the 3-move ballots. But there is not much remaining for 8x8 checkers programmers to compete at. The best programs now play nearly unbeatable 3-move checkers using only an 8-piece database. A big reason for this is the large opening databases that guide the programs through the first 20 - 30 plies of each ballot, at which point the programs can see a 'database draw' with a search. In 10x10 checkers a typical game takes many more moves to get to this stage of the game, and its unlikely that opening books can ever be made large enough to take it that far. There is this big region in the middle of the game where opening books have been left and endgame databases are no help. For this reason I think that endgame databases are less important to 10x10 programs than they are in 8x8 checkers, as most games are probably won or lost in this middle game region.

BTW, yesterday when I was trying to recall the sizes of the various database slices, my memory was correct about 2 - 7 pieces (30gb), but the 8-piece slices are 95gb and the 9-piece all checkers slice, which finished last night, is 22gb.

-- Ed

Rein Halbersma
Posts: 1722
Joined: Wed Apr 14, 2004 16:04
Contact:

Post by Rein Halbersma » Sat Jun 09, 2007 14:33

Ed Gilbert wrote:Rein, distributing large databases has been a lot more work for me than I anticipated. I would like to reduce the time I spend doing this, so I do not intend to make them available.

> I guess now that checkers is almost solved that `moving up a weight
> class' (no disrespect, there is always 12x12 canadian draughts!) is the
> logical next thing.

That Schaeffer has (almost) solved checkers will have little if any affect on human play or on checkers programs. One reason is that he has solved only the GAYP game, but not the 3-move game. His proof of the GAYP game involves solving only ~20 of the 3-move ballots. But there is not much remaining for 8x8 checkers programmers to compete at. The best programs now play nearly unbeatable 3-move checkers using only an 8-piece database. A big reason for this is the large opening databases that guide the programs through the first 20 - 30 plies of each ballot, at which point the programs can see a 'database draw' with a search. In 10x10 checkers a typical game takes many more moves to get to this stage of the game, and its unlikely that opening books can ever be made large enough to take it that far. There is this big region in the middle of the game where opening books have been left and endgame databases are no help. For this reason I think that endgame databases are less important to 10x10 programs than they are in 8x8 checkers, as most games are probably won or lost in this middle game region.

BTW, yesterday when I was trying to recall the sizes of the various database slices, my memory was correct about 2 - 7 pieces (30gb), but the 8-piece slices are 95gb and the 9-piece all checkers slice, which finished last night, is 22gb.

-- Ed
I guess distributing is a lot of effort indeed. But could you publish some stats (win/loss/draw numbers, longest #plies to win etc.) for the 7 and 8 piece endgames?

Ed Gilbert
Posts: 859
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Post by Ed Gilbert » Sat Jun 09, 2007 17:06

Rein Halbersma wrote:I guess distributing is a lot of effort indeed. But could you publish some stats (win/loss/draw numbers, longest #plies to win etc.) for the 7 and 8 piece endgames?
Longest plies to win data requires a separate type of computation which takes longer and uses much more disk space than WLD. But I do have the 'moves to conversion' data for up to 7 pieces. (There are no long MTC positions for my slices above 7 pieces because its almost all men, and every man move is a conversion move).

Here is an interesting position:

[FEN "W:WK7,K11:BK3,K6,8,13,K36."]

It is the longest moves to conversion position in 10x10 checkers for positions with up to 7 pieces. If each side plays optimally, there are 110 king moves that must be played before a man move or a capture can occur. Checkers programs that have only a WLD database usually have difficulty winning games from these kinds of positions against an opponent that plays optimally. What usually happens is that in many of the positions along the path to the end of the game, there are several king moves that the winning side can make that maintain the win, but only some of them make progress towards a conversion, while others regress further away from it. The actual conversion is far beyond the program's search horizon, so it has trouble selecting the correct move. The game can end up as an endless series of king moves because the winning side cannot find the ones that leads it to the next conversion.

Rein, I will have to run a script to gather WLD counts from the databases, I don't yet have that data in a nice convenient format.

-- Ed

Rein Halbersma
Posts: 1722
Joined: Wed Apr 14, 2004 16:04
Contact:

Post by Rein Halbersma » Sat Jun 09, 2007 17:41

Ed Gilbert wrote: Longest plies to win data requires a separate type of computation which takes longer and uses much more disk space than WLD. But I do have the 'moves to conversion' data for up to 7 pieces. (There are no long MTC positions for my slices above 7 pieces because its almost all men, and every man move is a conversion move).

Rein, I will have to run a script to gather WLD counts from the databases, I don't yet have that data in a nice convenient format.

-- Ed
Ed, the output of such a script would be nice to have. I understand the differences between WLD / DTC and DTW. But in constructing the 8-piece endgames, isn't the number of iterations equal to the maximum number of moves to conversion? This information should be retrievable from log-files, right?

Probably you already checked your data against Michel Grmminck's: http://www.xs4all.nl/~mdgsoft/draughts/stats/index.html
but if you like I can also scan some of Stef Keetman's 1998 articles on the 7 piece endgames.

Rein

BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Post by BertTuyt » Sat Jun 09, 2007 17:55

Rein, i would be interested in a scan of these articles, as i don't have them.
Its a pity that Stef no longer competes with his program Truus.
It was by far the standard in the nineties.
Think it will still play amongst the top 3.
However to become champion in the years to come, one nowadays has to optimize for large databases ( > 6 man), parallel processing (multi-core) and 64bit Windows.

A pity that Gerard Tailly and/or Michael Grimminck did not react so far, I am sure they also have some experience with 7-man, and we would definitely value their contribution.

Bert

Rein Halbersma
Posts: 1722
Joined: Wed Apr 14, 2004 16:04
Contact:

Post by Rein Halbersma » Sat Jun 09, 2007 18:37

BertTuyt wrote:Rein, i would be interested in a scan of these articles, as i don't have them.
Its a pity that Stef no longer competes with his program Truus.
It was by far the standard in the nineties.
Think it will still play amongst the top 3.
However to become champion in the years to come, one nowadays has to optimize for large databases ( > 6 man), parallel processing (multi-core) and 64bit Windows.

A pity that Gerard Tailly and/or Michael Grimminck did not react so far, I am sure they also have some experience with 7-man, and we would definitely value their contribution.

Bert
I also have Keetman's master's thesis and his paper with Victor Allis on tactical pattern learning lying around somewhere. This was the most important competitive edge his program Truus had on its competitors. Michel Grimminck's program Dragon also uses this (and a stripped-down version of very easy tactical patterns is in Chinook).

BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Post by BertTuyt » Sat Jun 09, 2007 18:50

Rein, I'm interested in all computer draughts related material.
Hope you find a way to distribute it.

Bert

Ed Gilbert
Posts: 859
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Post by Ed Gilbert » Sat Jun 09, 2007 18:56

Rein Halbersma wrote: Ed, the output of such a script would be nice to have. I understand the differences between WLD / DTC and DTW. But in constructing the 8-piece endgames, isn't the number of iterations equal to the maximum number of moves to conversion? This information should be retrievable from log-files, right?
The word 'conversion' means slightly different things in different contexts. Some database builders use it to mean a conversion into another subdivision of a database. So if you build them the way Schaeffer does, it means the move of a man on the leading checker rank. And of course if you're indexing subdivides by some other scheme then it means something else. When I use 'conversion', I mean any irreversible move, and thus any man move or capture move is a conversion move. I structure my builds so that the MTC data that I store is exactly that. So when I build incomplete databases I do not store the MTC data or even log the number of iterations to solve, because the numbers are alway very small (less than 10). This is due to the fact that I'm building for positions that are either all men or nearly all men on each side. Consider the case of 3men+1King vs 4men. Every other ply is a conversion move because there are no kings to move on one side.

I do have a database of all the MTC data for 2 - 7 pieces, and a file of the highest MTC positions in every database slice. I can email you the file if you're interested.
Rein Halbersma wrote: Probably you already checked your data against Michel Grmminck's: http://www.xs4all.nl/~mdgsoft/draughts/stats/index.html
but if you like I can also scan some of Stef Keetman's 1998 articles on the 7 piece endgames.
I used a builder I think I found on Harm Jetten's site to confirm my own builds for 2 - 6 pieces, but beyond that I have no independent verification of counts. I run a self verification pass on each subdivision as it is completed to make sure that each position is consistent with its successors. This adds a lot of time to the build but it finds most errors that can occur.

-- Ed

TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

partial information db

Post by TAILLE » Sun Jun 10, 2007 09:33

Hello,
For your information I made a very very extensive use of partial information database in building Damy endgame database. But the idea was not to save computing time during the generation process but to save memory in order to have the database completely in RAM.
Before discussing more on this very interesting subject can you give me the size of your endgame database (I mean the database with only the results of the positions and not the way to win really) for the positions with only 2-6 pieces ?
For Damy the size is 165Mb
Gérard

BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Post by BertTuyt » Sun Jun 10, 2007 11:34

I use the 2-6 man databases in Damage based on the work of Michael Grimminck and Harm Jetten.

Number of Databases:
- 2 pieces : 4
- 3 pieces : 12
- 4 pieces : 25
- 5 pieces : 44
- 6 pieces : 70
Total: 155, size 1.49 GigaByte

Gerard, Im very interested to learn how to generate endgame databases based on partial information.

I started now with the 7 piece database and i generated the 5Kx2K and 2Kx5K. But when i want to built other ones i get memory issues for 2 reasons, total memory when all sub-databases is loaded is larger then my memory space and even some single initial databases (not yet compressed) grow beyond my memory space and Windows XP.

When moving to Windows XP-64 (or a Vista alternative) only partly solves the issue. When database access is via memory-mapped file on Hard-Disk, i assume the access time will become so slow that the database calculation time will explode due to the I/O bottleneck.

I know that Michael has generated 7-piece databases based on slices (so a database is characterized by example a piece on the most advanced rank). This solves some of the issues, but the program becomes really complicated, and the number of databases also become very large.

Maybe I need to buy a multicore machine (although we don't have yet parallel algorithms for this) with 8_Gigabyte and Windows64 , but then i will face the same problems when moving to 8-pieces.

Bert

Ed Gilbert
Posts: 859
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Re: partial information db

Post by Ed Gilbert » Sun Jun 10, 2007 13:39

TAILLE wrote:Before discussing more on this very interesting subject can you give me the size of your endgame database (I mean the database with only the results of the positions and not the way to win really) for the positions with only 2-6 pieces ?
For Damy the size is 165Mb
The WLD db for those positions is 1.3gb. The db does not contain values for captures, but it does include non-side threatened captures that we normally exclude in 8x8 checkers databases. I experimented with excluding the these also, which cut the size almost in half, but it results in many more db lookups during a search so it did not seem like a good tradeoff. I also experimented with compressing the db using Huffman codes. This reduced the size by a factor of 0.6, but the lookups were a lot slower due to the variable lengths of the codes straddling byte boundaries.

What information are you excluding to make the sizes smaller?

-- Ed

TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Post by TAILLE » Sun Jun 10, 2007 14:44

Hello Ed and Bert,

I made a lot of work on the storage issue for endgame databases.

When I began to generate endgame databases, I used the very common following approach : for each position I simply calculate an index in order to address a table where you can find the result of the position (L,D,W).
This approach is very simple but as far as storage capacity is concerned it is far for being optimised.

I tried then a totally different approach.

Let’s take as an example the database corresponding to 1 king + 3 men (white side) against 2 kings (black side). The common sense tells us that almost all the positions are draw positions. How I exploited this information?
If it is white to move I store in the database only the positions that are lost for white (it takes about 260 000 bytes)
Similarly, if it is black to move I store in the database only the positions that are lost for black (it takes about 90 000 bytes).
Of course you have to find a judicious format to store a position in the database and you have also to find a performing mechanism to know quickly if a position is in the database or not (fabric secret!!!)

How Damy uses this database?
If it is white to move, Damy first ask to the database what is the result of the position. If the position is stored in the database then the result is “Lost” otherwise the result is “Draw or Win”.
In the case the result is “Draw or Win” Damy generate all successors of the position and ask again to the database what are the results for the generated positions. You can see immediately that you have then all the relevant information in order to give the good result for the initial position.

As a second example let’s take the positions with 5 kings against 2 kings. Almost all positions are a win for white aren’t they? So you can exploit this information and the size of Damy database for this type of position is only about 2 200 000 bytes.

With this approach you gain a lot of memory. Of course it is more CPU consuming but not necessarily TIME consuming because you avoid I/O toward the disk! In addition, when you take into consideration that the number of processors will increase may be more rapidly that the speed of the I/O then it seems a very good approach for the future.

Gérard

Ed Gilbert
Posts: 859
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Post by Ed Gilbert » Sun Jun 10, 2007 15:46

Hi Gerard, that is an interesting way to reduce the size of the database. It's not clear to me that it is faster, because resolving a single position requires a number of lookups. This is especially true in the 4x4 db where the large majority of the positions are draws.

It is a misconception that you need to store the entire db in RAM for lookups during a search. Caching is very effective in minimizing the IO, because the 'locality of reference' to positions during a search is very high. My experience with the 10-piece db for English checkers, which is 215gb of data, is that a 1gb cache buffer is sufficient to give good performance during a search without incurring excessive IO. The cache buffer I use is organized in 4k blocks that are exchanged using a LRU replacement strategy. Martin Fierz uses 1k cache blocks in his program, and Schaeffer uses 2k blocks.

-- Ed

User avatar
FeikeBoomstra
Posts: 306
Joined: Mon Dec 19, 2005 16:48
Location: Emmen

Post by FeikeBoomstra » Sun Jun 10, 2007 15:55

Very interesting Gerard,

I thought about this concept myself, but I had the idea that due to the big addrees space it would be comsuming too much space.

One question, why don't you look up the black on move positions in the mirror position with white on move??

Kind regards,
Feike Boomstra

Post Reply