Damy has now the 7 pieces endgame database

TAILLE · Post by **TAILLE** » Mon May 26, 2008 08:29

Hi,

For your information I have now finished the 7 pieces endgame database generation. Strictly speaking it is not really true because I did not generate the 6 pieces against 1 piece database but it is not my intention to generate this rather unuseful database!

Concerning the storage of the database my figures are the following :
6 pieces and less : 200 Mb
7 pieces and less : 4,4Gb

What is the next steps ?
1) To write a special algorithm to handle 7 pieces db draw positions in case my openent has only the 6 pieces db
2) To generate a partial 8 pieces db

I noted that Kingsrow and Dragon had this 7 pieces endgame db. What about Damage and the other programs ?

For Ed : I cannot joint you by mail (see post in this forum mailbox)

Gérard

Ed Gilbert · Post by **Ed Gilbert** » Tue May 27, 2008 00:29

Hi Gerard,

Congratulations on finishing the 7pc db! I read your email and I'm pleased to hear that all of our win/loss/draw counts agree perfectly. I think we can be confident that our data is correct.

2) To generate a partial 8 pieces db

I have some statistics on my 8 and 9 piece partial databases somewhere. I will post them later if I can find them.

-- Ed

Ed Gilbert · Post by **Ed Gilbert** » Tue May 27, 2008 01:00

Here are some statistics on the partial 8pc and 9pc databases I built for 10x10 draughts. By 'partial', I mean that the win/loss/draw information is only partial, because the subsets that are mostly kings were not built, and so for the remaining subsets that were built there are some positions that cannot be resolved, and some that can only be partially resolved. In the statistics pasted below, positions that cannot be resolved at all are given as U (unknown), and partially resolved positions are given as either LD (loss or draw) or DW (draw or win). All 6 possible values are stored in the database. Partially resolved positions are still useful during a search. They are used to limit the heuristic values.

In the statistics below, the notation such as 8-3140.txt means the database subset that has 8 pieces, 3 black men, 1 black king, 4 white men, and 0 white kings. The statistics include both 'black to move' and 'white to move' values. You can see that I have generated database subsets for 8 and 9 pieces with at most 1 king on each side. For 9 pieces, the subsets contain at most 1 total king, either black or white, but not both (because these subsets are already so large). What I find interesting is the very high percentage of positions that can be resolved in these subsets. For 8 pieces typically less than 1 percent of the positions are 'unknown'. For 9 pieces, the most useful subset of 5 men vs. 4 men has only 2 percent unknown, and this is built using an 8-piece dependency database that is itself already incomplete.

counts_8-3131.txt
W 9.3%, D 64.4%, L 0.3%, LD 3.9%, DW 21.2%, U 0.8%
Exact 74.0%, Partial 25.2%, Unknown 0.8%

counts_8-3140.txt
W 23.7%, D 41.2%, L 20.2%, LD 6.5%, DW 8.2%, U 0.1%
Exact 85.1%, Partial 14.8%, Unknown 0.1%

counts_8-4040.txt
W 24.1%, D 57.1%, L 10.1%, LD 3.0%, DW 5.6%, U 0.0%
Exact 91.4%, Partial 8.6%, Unknown 0.0%

counts_8-4121.txt
W 32.4%, D 31.4%, L 12.0%, LD 6.5%, DW 15.4%, U 2.3%
Exact 75.8%, Partial 21.9%, Unknown 2.3%

counts_8-4130.txt
W 39.5%, D 8.1%, L 45.3%, LD 4.5%, DW 2.4%, U 0.1%
Exact 92.9%, Partial 6.9%, Unknown 0.1%

counts_8-5021.txt
W 19.7%, D 59.2%, L 4.9%, LD 3.7%, DW 11.8%, U 0.8%
Exact 83.7%, Partial 15.5%, Unknown 0.8%

counts_8-5030.txt
W 38.8%, D 24.9%, L 28.7%, LD 3.5%, DW 4.1%, U 0.1%
Exact 92.4%, Partial 7.5%, Unknown 0.1%

counts_9-4140.txt
W 30.3%, D 8.6%, L 26.7%, LD 7.1%, DW 13.5%, U 13.8%
Exact 65.7%, Partial 20.5%, Unknown 13.8%

counts_9-5031.txt
W 17.7%, D 32.0%, L 8.6%, LD 8.1%, DW 26.6%, U 6.9%
Exact 58.4%, Partial 34.7%, Unknown 6.9%

counts_9-5040.txt
W 28.9%, D 30.5%, L 14.6%, LD 9.3%, DW 14.7%, U 2.0%
Exact 74.0%, Partial 24.0%, Unknown 2.0%

-- Ed

TAILLE · Post by **TAILLE** » Tue May 27, 2008 10:18

Ed Gilbert wrote:...What I find interesting is the very high percentage of positions that can be resolved in these subsets. For 8 pieces typically less than 1 percent of the positions are 'unknown'. For 9 pieces, the most useful subset of 5 men vs. 4 men has only 2 percent unknown, and this is built using an 8-piece dependency database that is itself already incomplete.
-- Ed

The number of resolved positions is for me most relevant figure. A partial result or an unknown result has of course to be stored in the database but I consider that a partial result is almost identical to an unknown result for at least to resaons. Firstly for a human it is obvious (in more than 99%) to konw who has an advantage. In the 8-4121 slice it is obvious that the problem is to know whether or not black can win and a partial result seems quite identical to an unknown result. Secondly, when, in the exploration of the tree in a real game you encounter a partial result you might of course stopped the analysis of this variant taking into account the partial result but, most of the time you will have to continue because the partial result is unsatisfactory.
Anyway a figure between 75% and 90% for the resolved positions appears a very interesting result because I think that the unresolved positions (partial or unknown) correspond to positions that might appear less frequently in practice.

For the moment I have to finalise (on paper) the analysis of the algorithm.
It is quite simple when kings are only in one side but not so simple when repetitions can occur.

Gérard

Ed Gilbert · Post by **Ed Gilbert** » Wed May 28, 2008 12:55

Hi Gerard,

TAILLE wrote:A partial result or an unknown result has of course to be stored in the database but I consider that a partial result is almost identical to an unknown result for at least to resaons. Firstly for a human it is obvious (in more than 99%) to konw who has an advantage. In the 8-4121 slice it is obvious that the problem is to know whether or not black can win and a partial result seems quite identical to an unknown result.

I agree that for the 8-4121 slice the partial results are probably not very useful. They are more useful for slices such as 8-4040 and 9-5031. In any case it is absolutely necessary to keep track of these partial results during the building phase, so having them available during engine searches does not really cost anything, except that the data could be compressed a bit better if they were excluded.

Concerning the storage of the database my figures are the following :
6 pieces and less : 200 Mb
7 pieces and less : 4,4Gb

I understand that you store only loss values to achieve these compression results. I imagine that you probably do not store the loss values for capture positions -- is this correct? It seems that on average it must be necessary to do a large number of db probes in order to lookup a position's value. For example, suppose you go to lookup the value of a position which is a draw. You probe its value and find that it is not a loss, so you generate a movelist. The position has 10 successors, and none of them are losses so you have to visit each one. 2 of the successors are capture positions, so to find their values you have to generate movelists for those and visit each of their ~10 successors. Perhaps some of those successors are also captures... It seems like you could easily end up probing 30 or 40 different positions to get the value of the original position. Is this an accurate example of how it works?

-- Ed

TAILLE · Post by **TAILLE** » Wed May 28, 2008 16:43

Ed Gilbert wrote: I agree that for the 8-4121 slice the partial results are probably not very useful. They are more useful for slices such as 8-4040 and 9-5031. In any case it is absolutely necessary to keep track of these partial results during the building phase, so having them available during engine searches does not really cost anything, except that the data could be compressed a bit better if they were excluded.
-- Ed

Yes I agree and it is not my intention to not differentiate partial and unknown results. My remark concerns only the usefulness of the database; only the percentage of exact result is significant for me.

Ed Gilbert wrote:I understand that you store only loss values to achieve these compression results. I imagine that you probably do not store the loss values for capture positions -- is this correct? It seems that on average it must be necessary to do a large number of db probes in order to lookup a position's value. For example, suppose you go to lookup the value of a position which is a draw. You probe its value and find that it is not a loss, so you generate a movelist. The position has 10 successors, and none of them are losses so you have to visit each one. 2 of the successors are capture positions, so to find their values you have to generate movelists for those and visit each of their ~10 successors. Perhaps some of those successors are also captures... It seems like you could easily end up probing 30 or 40 different positions to get the value of the original position. Is this an accurate example of how it works?
-- Ed

Basically you are correct; my tests showed me that the use of a larger db is not so good due to the I/O on the disk but I also perfectly know that you do not agree on this point. Maybe my cash handling is not very efficent!
I have here to add a point : in fact depending of the type of position I did not necessarily choose to store only the loss positions.
With 4 kings a 3 kings I effectively stored only the loss black and white positions and this sounds very efficient
With 5 white kings against 2 black kings I chose to store only the white loss and the white draw positions with white to play with 2 advantages : firstly I reduce the number of positions to store in the database and secondly, in order calculate the result of the position I prefer to generate the black moves (only 2 kings) rather than the white moves (5 kings means a lot of possible moves). Of course, depending of the slice or the subslice the program calculates the most efficient storage format.

Gérard

Ed Gilbert · Post by **Ed Gilbert** » Wed May 28, 2008 23:47

Hi Gerard,

TAILLE wrote:Basically you are correct; my tests showed me that the use of a larger db is not so good due to the I/O on the disk but I also perfectly know that you do not agree on this point. Maybe my cash handling is not very efficent!

I can explain how I implement LRU caching. Maybe you will find it useful. You take a large chunk of memory and divide it up into 4k blocks. Actually each block is a structure that has a 4k buffer and also some other fields to implement a circular, doubly linked list, so there are pointers to the previous and next blocks in the list. There is also a pointer somewhere to the head of this linked list. This head points to the 'most recently used' list node.

When you do a database lookup, and you access a value that is stored in one of the 4kb cache blocks, you unlink the node containing that cache block and move it to the head of the linked list. This becomes the 'most recently used' node (because you just recently looked up a value in it). The head->previous node is the next most recently used node, and head->next is the least recently used (LRU) node. If you need to read a new 4k block into the cache from disk, you read it into the least recently used node (head->next), and then move that node to the head.

Another aspect to this is that I do not always read a 4k block from disk if the position I am looking up is not already in a cache block. Whenever the endgame database is queried for a WLD value, there is a priority associated with the position. If the position has a high priority, then the position's data will be retrieved from disk and loaded into a cache block if it is not already in the cache. If the position has a low priority, then the cache is consulted, and if the value is in cache it is returned, but if the value is not in cache then the lookup() call returns a NOT_IN_CACHE token value. The heuristics for the priority function are something that I have adjusted and tweaked over time, but you can easily imagine that this is a function of search parameters and the importance of the position to the search. The priority function and the threshold for high vs. low priority are adjusted so that the search does not slow to a crawl from excessive disk I/O for the endgame database. In practice this works quite well, and with e.g. a 2gb cache buffer you can use a 200gb database without slowing the search excessively and still have only a moderately small percentage of lookup() calls return NOT_IN_CACHE. The reason this works is something that computer scientists call 'locality of reference'. This refers to the fact that when you load a 4k cache block with data for a new position, you are also loading the values for perhaps a million or so other positions that are nearby, and since your search needed this one position it will very likely need to lookup many more of these other 1 million positions in the same cache block.

TAILLE wrote:I have here to add a point : in fact depending of the type of position I did not necessarily choose to store only the loss positions.
With 4 kings a 3 kings I effectively stored only the loss black and white positions and this sounds very efficient
With 5 white kings against 2 black kings I chose to store only the white loss and the white draw positions with white to play with 2 advantages : firstly I reduce the number of positions to store in the database and secondly, in order calculate the result of the position I prefer to generate the black moves (only 2 kings) rather than the white moves (5 kings means a lot of possible moves). Of course, depending of the slice or the subslice the program calculates the most efficient storage format.

I would like to understand how this works, because I think this kind of compression could work well for data that is not used very frequently. For example, positions with 2 kings or more on one side are not very often needed during a game, so it might be worthwhile to use a compression scheme for these slices that has slower lookups but is more efficient in using disk and cache space. When you say that for positions with 5wk and 2bk you store only the white loss and white draw positions (to reduce the number of positions to store), I am confused. The WLD statistics for this slice show 0 white losses, 489226234 wins, and 59194628 draws (these values excluding positions with captures). So wouldn't it be most efficient to store only the white losses, since there are none of these and this should take essentially no space at all? Or is this a case where you think it is more important to be able to return the position's value without needing to visit all the successor positions, because of the high branching factor that happens with 5 kings?

Also, you wrote that you store only the loss and draw values, which means that you can infer the win values, because you know the other 2. Does this mean that you store the losses and draws in separate files, or separate storage areas? Do you store them separately to get better compression?

-- Ed

TAILLE · Post by **TAILLE** » Thu May 29, 2008 01:17

Ed Gilbert wrote: Whenever the endgame database is queried for a WLD value, there is a priority associated with the position.
-- Ed

Just a small question : do you use a "static" priority known only by the endgame db or do you use a "dynamic" priority given to the db in the query ?

Ed Gilbert wrote: When you say that for positions with 5wk and 2bk you store only the white loss and white draw positions (to reduce the number of positions to store), I am confused. The WLD statistics for this slice show 0 white losses, 489226234 wins, and 59194628 draws (these values excluding positions with captures). So wouldn't it be most efficient to store only the white losses, since there are none of these and this should take essentially no space at all?
-- Ed

It seems you miss the point.
with 5wk and 2bk Damy takes into account only the 4 numbers (I never store in the db a winning position) :
White loss = 0
White draw = 59194628
Black loss = 376519010
Black draw = 80452496
and Damy compares the 3 following values :
White loss + White draw, Black loss + Black draw, White loss + Black loss
In this case, the smallest number corresponds to White loss + White draw and Damy stores in the data base only the corresponding 59194628 positions. That means that you do not find in the data base a position with black to play.

Ed Gilbert wrote: Also, you wrote that you store only the loss and draw values, which means that you can infer the win values, because you know the other 2. Does this mean that you store the losses and draws in separate files, or separate storage areas? Do you store them separately to get better compression?
-- Ed

Yes Ed I always builds 2 files to store the results of a (sub)slice in the db. These files corresponds to the two numbers Damy have chosen, as I explained above. As a consequence you can see that I do not store a result but only a list of positions. You can then understand that the storage format can easily use 50bits masks to assure a better compression.

For partial db (8-9 pieces db) I concluded that I will need 5 files to store all the results of the (sub)slice.
Typicaly you can verify that it is possible to store only wL, wLD, wLW, bL and bLD positions (w means white to move and b means black to move).
For example if it is white to move :
A position wL, wLD or wLW is found immediatly in the db. In the other cases I generate all the successors. If one successor is in the bL file then the original position is a wW position. Otherwise if one successor is in the bLD file then the original position is a wDW. Otherwise the original position is a wD position.
It is of course very interesting to note that is is not necessary to store a draw position !

Gérard

Ed Gilbert · Post by **Ed Gilbert** » Thu May 29, 2008 14:47

Hi Gerard,

TAILLE wrote:Just a small question : do you use a "static" priority known only by the endgame db or do you use a "dynamic" priority given to the db in the query ?

The priority is computed dynamically by heuristics in the search. What gets passed to the endgame db lookup is a boolean indicating that the position is either low or high priority.

and Damy compares the 3 following values :
White loss + White draw, Black loss + Black draw, White loss + Black loss
In this case, the smallest number corresponds to White loss + White draw and Damy stores in the data base only the corresponding 59194628 positions. That means that you do not find in the data base a position with black to play.

Ok, that is very clear now. In all cases you have enough information to get a position's value either directly from a lookup of the position, or from lookups of its immediate successors (unless some of the successors are captures, then this must be done recursively).

Yes Ed I always builds 2 files to store the results of a (sub)slice in the db. These files corresponds to the two numbers Damy have chosen, as I explained above. As a consequence you can see that I do not store a result but only a list of positions. You can then understand that the storage format can easily use 50bits masks to assure a better compression.

That is very interesting, because it is much different than the scheme that I use. I would be interested to learn some more about your storage scheme and how you use 50-bit masks, if you are willing to share it. I will describe how I store WLD values. A slice or db subdivision has N positions, which can be described using a set of indicies 0..N-1. To lookup a position's value, the position is converted to an index, and then the job is to find the value in the db corresponding to that index.

Values in the db consist of a series of byte tokens. Each byte represents the WLD values of some string of consecutive index numbers. The encoding of the WLD values into a byte is as follows: byte values 0 through 80 represent the WLD values of 4 consecutive indicies. For example if you give a win the value 0, loss 1, and draw 2, and you want to encode the values of 4 consective positions P1..P4 with values V1..V4, then the encoding of that database byte is V1 + 3*V2 + 9*V3 + 27*V4. Of course 4 positions per byte is not very good compression, and those value between 0 and 80 are only used when consecutive positions have different WLD values. When a string of more than 4 consecutive positions have the same WLD value, these positions are encoded using the remaining values between 81 and 255. 81 through 138 are used for wins, 139 through 196 for losses, and 197 through 255 for draws. Within each range, the values encode runlengths ranging from 5 up to about 10,000. E.g. 81 means 5 wins, 82 is 6 wins, 138 means 10,000 wins, 139 means 5 losses, etc.

To facilitate fast random access to the WLD value of any index, an index file is also written which contains the index number at each 1024-byte boundary in the data file. To lookup a position's value, you calculate the position's index, then do a binary search in the index file to find which 1024-byte block contains that index. That database block is either in cache or has to be loaded into cache from disk. When the block is in cache, read each byte sequentially from the beginning of the block until the byte with the target index is reached, and then the byte token gives the WLD value.

For partial db (8-9 pieces db) I concluded that I will need 5 files to store all the results of the (sub)slice.
Typicaly you can verify that it is possible to store only wL, wLD, wLW, bL and bLD positions (w means white to move and b means black to move).
For example if it is white to move :
A position wL, wLD or wLW is found immediatly in the db. In the other cases I generate all the successors. If one successor is in the bL file then the original position is a wW position. Otherwise if one successor is in the bLD file then the original position is a wDW. Otherwise the original position is a wD position.
It is of course very interesting to note that is is not necessary to store a draw position !

For incomplete dbs with 6 values, I use a variation on the encoding scheme described above. Values 0 through 35 are used to encode the values of 2 consecutive positions using the calculation V1 + 6*V2. Values 36 through 255 are used for various runlengths of each of the 6 possible database values.

-- Ed

TAILLE · Post by **TAILLE** » Thu May 29, 2008 22:42

Hello Ed,

Just to verify my understanding of your storage scheme :
You effectively compute an index of each position but, at this stage, you do not exclude the position with capture (these positions could have a correct index). In the storage process however you give a "false" value to these positions in order to improve the compression. Typically I imagine you take the value of the most interesting adjacent position/index. Because you never query the db with the index of a position with capture it does not harm. Is it correct ?
Is it also correct to say that each file on the disk have a size smaller than the size a block (4k).

Damy storage scheme is very different because I do not store value but only a list of positions. The positions are stored within a tree. Taking for example a position with only one king (a very interesting case for the 9 pieces database !) The principle is to use the first man to build the first level of the tree (that means that the root has 45 successors as a maximum), then you use the second man to build the second level of the tree (44 successors as a maximum if the 2 first men are the same color) etc up to the 8th man. We are now on the leaves of the tree and we have to described which positions of the remaining piece ( a king in my example) are concerned for being in this db. Here I use several format depending of the number of positions concerned : one byte with only 1 position, 2 bytes with 2 or 3 positions, 3 bytes with 4 positions, 4 bytes with 5 positions, 5 bytes if 6 or 7 positions and 6 bytes if 8 or more positions. In this last case it is simply a 48 bit mask (in fact I do not need really a 50 bit mask because as soon as I put already 2 pieces or more on the board it remains less than 48 squares available for the last piece.

Gérard

Ed Gilbert · Post by **Ed Gilbert** » Fri May 30, 2008 01:43

TAILLE wrote:Just to verify my understanding of your storage scheme :
You effectively compute an index of each position but, at this stage, you do not exclude the position with capture (these positions could have a correct index). In the storage process however you give a "false" value to these positions in order to improve the compression. Typically I imagine you take the value of the most interesting adjacent position/index. Because you never query the db with the index of a position with capture it does not harm. Is it correct ?

Yes, that is exactly how it works. The values for captures are allowed to float to whatever value maximizes the lengths of runs of consecutive values, thus minimizing the size of the database.

TAILLE wrote:Is it also correct to say that each file on the disk have a size smaller than the size a block (4k).?

No, because I do not store every cache block as a disk file. All of the data for a slice (which is the term I use for any particular combination of black men, black kings, white men, and white kings) is stored in one data file, and that file includes data for both black and white to move. In addition, for the small databases of 2, 3, and 4 pieces, all the data for the same number of pieces is stored in a single file. It is simpler to have a small number of database files to deal with, and it means only a small number of file handles have to be kept open. Random access to any particular 4k block is accomplished with fseek().

TAILLE wrote: Damy storage scheme is very different because I do not store value but only a list of positions. The positions are stored within a tree. Taking for example a position with only one king (a very interesting case for the 9 pieces database !) The principle is to use the first man to build the first level of the tree (that means that the root has 45 successors as a maximum), then you use the second man to build the second level of the tree (44 successors as a maximum if the 2 first men are the same color) etc up to the 8th man.

Ok, I see. That particular data structure has a unique name, it is known as a 'trie', the name derived from the middle syllable of the word 'retrieval'. It is usually described in textbooks giving the example of storing words from a dictionary, and is useful when many of the words begin with the same sequence of letters. Of course this property is also true of many sets of draughts positions.

One difficulty with this representation is that it is a very difficult structure to use with dynamic caching. I can understand now why you are very concerned with fitting the entire database in RAM.

Do you also use this data structure for the other format of your data, the one you use when you are building the databases?

-- Ed

TAILLE · Post by **TAILLE** » Fri May 30, 2008 11:00

Ed Gilbert wrote:It is simpler to have a small number of database files to deal with, and it means only a small number of file handles have to be kept open. Random access to any particular 4k block is accomplished with fseek().
-- Ed

Very clear now.

Ed Gilbert wrote: Do you also use this data structure for the other format of your data, the one you use when you are building the databases?
-- Ed

Yes. if for the small db I use a wL and a bL file then that means that the subslice has a lot draw positions and I complete the db by the file wW (withe to play Win result) and the file bW.
If for the small db I use a wL and a wD file then that means that the subslice corresponds to advantageous positions for white and I complete the db by the bD and bW files.
If for the small db I use a bL and a bD file then that means that the subslice corresponds to advantageous positions for black and I complete the db by the wD and wW files.

Your are right Ed, cashing is clearly less efficient with my approach. With the exchanges we made and the acquired experience I will try some format change before beginning the 8 pieces generation.
In particular I optimised the search of a given position in the db but it is not the good optimisation. It is far better to optimise the case where the position is not in the db !
I have also to analyse if some mixure of our approaches can help.

Gérard

TAILLE · Post by **TAILLE** » Fri May 30, 2008 14:36

Hi Ed,

Just a small point I forgot to mention. Suppose that the main program recognise that white has the advantage. As a consequence the testvalue in the MTD procedure is a positive value. If now during the exploration process I reach a 7 pieces position with black to move, you can see that I have all the information needed be only reading the bL file : if the position is in the file every is OK; if not I do not need to generate all the possible moves because it is suffisant to know that the position is not a losing position !

Gérard

Ed Gilbert · Post by **Ed Gilbert** » Sat May 31, 2008 01:51

TAILLE wrote:Just a small point I forgot to mention. Suppose that the main program recognise that white has the advantage. As a consequence the testvalue in the MTD procedure is a positive value. If now during the exploration process I reach a 7 pieces position with black to move, you can see that I have all the information needed be only reading the bL file : if the position is in the file every is OK; if not I do not need to generate all the possible moves because it is suffisant to know that the position is not a losing position !

Yes, I see. Your endgame lookup can return DRAW_OR_WIN, which can be used to limit the heuristic value. I do this with the partial databases also. but the difference is that I don't need to compute a heuristic value first and pass it to the endgame lookup.

I ran an experiment today to see how much smaller the 6pc db will be if I store only losses, or only the full values for one color. Since only 2 values need to be stored, LOSS or NOT_LOSS, I modified the compression encoding to pack 7 non-consecutive values into the first 128 values in a byte, and used the remaining 128 values for various runlengths. The 2-6pcs db was reduced from 1.3gb to 300mb. However, I would not consider doing this for positions with 1 king or less on a side. The database size for the combination of using the reduced information db for positions with more than 1 king on either side plus the full information for positions with 1 king or less was about 600 mb.

There are some aspects to this reduced information format that I am not really comfortable with. There is the large number of lookups needed when you have to visit all successors and one or more successors is a capture. There is also the possibility that you have to visit all the successors and one or more successors convert into a different slice due to king promotion. It may be that the parent position was black-to-move, and the successor position is in a different slice that has no information for white-to-move, so all of its successors have to be visited as well.

I also dont know how this would work with the high priority/low priority lookup attribute. If a typical lookup requires 20 or 30 position probes, then the probability that one of those 20 or 30 position is not in cache is much greater. For a low priority lookup you could end up probing most of those 20 or 30 dependency positions and then hit one that is not in cache and end up returning NOT_IN_CACHE after doing all that work. So right now this does not look like a very good tradeoff to reduce the total db size by a factor of about 2.

A consequence of modifying the compression code today is that I found my function which tries to maximize the runlengths was not handling an important case in an optimal way. I fixed this and got a 10% improvement in the sizes of some quick tests on the 5pc database, so I will re-compress everything and hopefully gain a 10% size reduction. I suspect the effect will be even greater for incomplete dbs because the runlengths are shorter and the runs more frequent. It's odd that I never noticed this before. I've been using this compression code for several years. I used it to compress the 10-piece databases for English and Italian checkers, so it is likely I could make those 10% smaller also.

-- Ed

BertTuyt · Post by **BertTuyt** » Sun Jun 01, 2008 11:18

Gerard, congratulations.
This is still something on my list, but the time needed to finish this exercise does not encourage me to start.
Anyway, I received from Michael Grimminck the sources to generate these databases, and I have now modified them for Vista64 and basically they are ready to do the job.
Most likely I will start during holiday (mid July), but doubt if I need to buy a separate PC for these calculations.

How long (based on your experiences) do you think it will take on My Quad Q6600 ( 2.4 Ghz), with only 1 core activated (im not sure if it is possible and/or easy, to re-write the program to use all 4 cores).

Looking to the prices of Memory it is easy to install 8 GigaByte Memory (16 Giga in 4 GigaByte chunks is , i guess, not so widely available). Do you think it would be beneficial to install 8 GigaByte, to drastically reduce Disk-IO during this process, as I think that this could have a positive effect on the time needed.

In the mean time, Im working to increase the search-depth of my Engine, after I found out that KingsRow in the late endgame outsearches Damage most of the time, with often dramatic consequences for me.
Im now focussing on pruning and testing the MPC algorithm. In some cases i was able to increase the speed (time needed to reach a specific depth) with 500%, but im afraid im throwing the baby away with the bath water.

Therefore Im modifying the engine that it can play a match against itself, with different parameter settings for white and black (and going through all openings for white and black, as we did during the KingsRow - Damage match.

When I red your conversation with Ed im really motivated to also start on DBs , so i hope i can contribute to the discussion and generate new ideas (going where no-one has gone before).

Bert

World Draughts Forum

Damy has now the 7 pieces endgame database

Damy has now the 7 pieces endgame database

8 and 9 piece statistics

Re: 8 and 9 piece statistics

Re: 8 and 9 piece statistics

Re: 8 and 9 piece statistics

Re: 8 and 9 piece statistics

Re: 8 and 9 piece statistics

Re: 8 and 9 piece statistics

WLD storage scheme

Re: WLD storage scheme

Re: WLD storage scheme

Re: WLD storage scheme

Re: WLD storage scheme