Damy has now the 7 pieces endgame database

Discussion about development of draughts in the time of computer and Internet.
TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Post by TAILLE » Sun Jun 01, 2008 15:17

BertTuyt wrote: How long (based on your experiences) do you think it will take on My Quad Q6600 ( 2.4 Ghz), with only 1 core activated (im not sure if it is possible and/or easy, to re-write the program to use all 4 cores).
Bert
I used a core 2 duo 2Ghz with 2Gb and only one core activated for the generation process (I used the other core for developping purpose). In addition I am only in a 32bits environement. I suppose the algorithm would have a great impact on the time; I made a lot effort on it but of course I am not sure to have the best one ! For your information I took 2,5 days for generating the 6 pieces db and 81 days for generating the 7 pieces db.

My opinion is that you do not need to re-write your program to use all the 4 cores because you may simply run 2 (or more) instances of your program, the first instance generating the 4x3db and the second the 5x2db. I do not see any gain to have a true multicore generating process. It will be a lot of effort with almost no gain.
BertTuyt wrote: Looking to the prices of Memory it is easy to install 8 GigaByte Memory (16 Giga in 4 GigaByte chunks is , i guess, not so widely available). Do you think it would be beneficial to install 8 GigaByte, to drastically reduce Disk-IO during this process, as I think that this could have a positive effect on the time needed.
Bert
It could be a good idea for running 2 or more instances of your program. Otherwise I think it is suffisant to have 2Gb.
BertTuyt wrote: Im now focussing on pruning and testing the MPC algorithm.
Bert
Can you give me a good reference article on this algorithm ?
BertTuyt wrote: When I red your conversation with Ed im really motivated to also start on DBs , so i hope i can contribute to the discussion and generate new ideas (going where no-one has gone before).
Bert
Let me give you another strong motivation to generate this 7 pieces db.

Take the following position :
Image
White to play

The above position is a draw. Now that I have the 7 pieces db I am working on a special algorithm to try and trap a program with only the 6 pieces algorithm.
The idea is the following :
Let's consider first the "naturel" move 1.02-16
On this move there exists 11 black responses; according to 7 pieces db, 2 are correct (1..05-46 and 1...05-37) and the 9 other moves are wrong. If now the black side is a program with only the 6 pieces db it will not know that the 2 moves 1..05-46 and 1...05-37 are correct but (and here is the interesting point) it will easily demonstrate that the 9 other moves are losing move. It is very easy to recognise this situation and conclude that 1.02-16? is a bad move because a program with only the 6 pieces db will always play ont of the 2 correct moves.

Now consider the move 1.02-07. Black has 11 possible answers with 6 correct moves (05-10, 05-14, 05-19, 05-28, 05-37 and 05-46) and the 5 other moves are wrong. But now a program with only the 6 pieces db will easily demonstrate that 4 moves are wrong but it will not be able to demonstrate that the move 1...25-30? is a losing move. Damy can recognise the situation and conclude that a program with only the 6 pieces db will hesitate only between 05-10, 05-14, 05-19, 05-28, 05-37, 05-46 and ... 25-30. As a consequence black might choose the wrong move and lose the game.

Conclusion : 1.02-16? forces the oppenent to choose a good move but 1.02-07 put in place a terrible trap ! Damy has to play 1.02-07!

Second illustration with almost the same position :
Image
White to move.

Let's take as an exemple a move of white king on the 04-22 diagonal :
1.27-13? is rather a bad move because most of the programs will remain the black king on the long diagonal which are good moves.
1.27-09? is also a bad move because the 7 pieces db tells us that the answer 1...25-30! allows a draw (=> no trap)
1.27-04? is also a bad move because any program with the 6 pieces db will immediatly see that 1...25-30? is not correct due to 2.47-41 (=> no trap)
1.27-18? is also bad because of the immediat 1...36-41 (=> no trap)
1.27-22!! is the good move. Now a program with only the 6 pieces db will have enormous difficulties to demonstrate that 1...25-30 will be a bad move.

Will Damage be able to avoid this 25-30? move in my 2 examples ?

If not I think we will be already motivated by building the 7 pieces db !

In addition I found some interesting improvments of this very simple algorithm in order to handle 7 pieces draw positions when playing a game against a program having only the 6 pieces db.

Now I guess you are very motivated by building this 7 pieces db ! Aren't you.

Gérard

TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

8-9 pieces partial db

Post by TAILLE » Sun Jun 01, 2008 16:29

Hi Ed.,

How have you decided which slices are interesting to generate with a partial db approach ?
I noted you choose to build 8-3131, 8-3140, 8-4040, 8-4121, 8-4130, 8-5021, 8-5030, 9-4130, 9-5021, 9-5030.
If my calculation is correct these slices correspond to 2 075 681 729 120 positions. The question is the following : are they the most interesting positions ?

My view is the following :
The 8-3131 slice will certainly generate a very great number of draw positions. I guess a lot of practical winning positions correspond to positions where we have first to promote a new king and these positions will not be resolved because the 8-2231 has not been generated. The 8-3140 slice seems more interesting because it allows to find all the wins obtained by the capture of an opponent man. Unfortunelay you will not discover the majority of wins obtained when reaching the 8-3131 because the winning positions of this slice have not been really discovered ! Of course I perfectly know that knowing a lot of draw positions is of great help in order to stop the exploration of a branch as soon as possible but I am not very happy by missing the winning positions.

Why not generating the following slices : 8-0431, 8-1331, 8-2231, 8-3131, 8-0440, 8-1340, 8-2240, 8-3140, 8-4040.
For these slices the total number of positions is 1 662 008 769 200 and it seems to me as faisable as your choice.
The most important point to note is that, with this approach, I will be able to continue later to generate other 8-pieces slices (5x3) and some 9 pieces slices with a guaranted gain in the efficiency. With your approach, if you generate now the 8-0431, 8-1331, 8-2231, 8-0440, 8-1340, 8-2240 you cannot escape to regenerate the 8-3131 slice and the following one which is very harmful.

For the moment I have to continue to analyse my generation strategy.

My generation algorithm for building partial db is now quite clear in my mind and I have know to code and test it. Having in my mind to generate first the 8-0431 slice it seems to me interesting to begin by testing the generation of the 6-0411 slice (supposing that I do not have the 6 pieces db) and then the 7-0421 slice (supposing that I do not have the 7 pieces db).
In case you want to verify your own algorithm by comparing our results I intend to handle the number of L, D, W, LD, LW, W for white and black and for capture and non-capture.
Are you able to generate these counters for the 6-0411 slice ?

Gérard

Ed Gilbert
Posts: 854
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Post by Ed Gilbert » Sun Jun 01, 2008 19:55

Hi Bert,
BertTuyt wrote:How long (based on your experiences) do you think it will take on My Quad Q6600 ( 2.4 Ghz), with only 1 core activated (im not sure if it is possible and/or easy, to re-write the program to use all 4 cores).
This is of course very dependent on your algorithm, but there is an easy way to see how long it will take. My db build program knows how many total positions there are in the set of slices that it is configured to build, and how many positions are in the set of slices that it has already finished building, and so each time it finishes building some minor chunk of the db it prints out the percentage of the total that has been completed. With these statistics being printed, you only need to run the build program for a day to so and see what rate you are progressing at, and then you can very easily extrapolate how long it will take you to finish.
BertTuyt wrote:Looking to the prices of Memory it is easy to install 8 GigaByte Memory (16 Giga in 4 GigaByte chunks is , i guess, not so widely available). Do you think it would be beneficial to install 8 GigaByte, to drastically reduce Disk-IO during this process, as I think that this could have a positive effect on the time needed.
In my case I built the 10x10 draughts dbs using computers with 2 or 3gb of ram, and that was more than sufficient. More than 2gb would not have made a difference in the speed. However those were single-core computers. As you said, ram is very inexpensive right now and I would max it out at 8gb, which is what I have in my quad-core. Get ECC ram if your motherboard supports it. The price difference is small and for building databases you want that reliability.

As for 4gb memory sticks, unless your PC's motherboard is a server type it almost certainly will not accept memory sticks larger than 2gb each.
BertTuyt wrote:In the mean time, Im working to increase the search-depth of my Engine, after I found out that KingsRow in the late endgame outsearches Damage most of the time, with often dramatic consequences for me.
Im now focussing on pruning and testing the MPC algorithm. In some cases i was able to increase the speed (time needed to reach a specific depth) with 500%, but im afraid im throwing the baby away with the bath water.

Therefore Im modifying the engine that it can play a match against itself, with different parameter settings for white and black (and going through all openings for white and black, as we did during the KingsRow - Damage match.
Good, I'm glad you are making some progress! I have also been making improvements to kingsrow and I am interested in running another round of engine match games when you are ready.

-- Ed

Ed Gilbert
Posts: 854
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Post by Ed Gilbert » Sun Jun 01, 2008 20:23

TAILLE wrote: My opinion is that you do not need to re-write your program to use all the 4 cores because you may simply run 2 (or more) instances of your program, the first instance generating the 4x3db and the second the 5x2db. I do not see any gain to have a true multicore generating process. It will be a lot of effort with almost no gain.
Yes, you can run 2 instances of the program and have them work on 4x3 and 5x2 independently. But I am going to disagree with Gerard on this point. It is not necessarily a lot of work to make your program work in a distributed environment, and the gains are quite important. If all you can do is run separate instances for 4x3 and 5x2, you will only be using at most half of the computing bandwith you have available in your one machine. When you start building really large databases such as 8 and 9 pieces, your build times will be measured in years unless you can attack the problem with a number of cores in parallel. In my distributed environment the communication between parallel threads is quite minimal. Satellite threads get the info for a new subdivision to be built and then go off and built them independently. Newly built subdivisions are added to a master host so they are available for all other machines or threads to use. It was not a great deal of additional work to add this capability once the basic build program was working correctly in stand-alone mode.
TAILLE wrote: Let me give you another strong motivation to generate this 7 pieces db.
I tried your two example positions using kingsrow with a 6pc database, and in both cases it finds the correct move to draw after searching for 1 second or less. But of course this could just be luck, as kingsrow certainly does not give an eval score of db draw on these position, and in general your strategy might help it win some games. I guess you have to look at how much overhead the algorithm adds to the search. In kingsrow I do not try to optimize the case of db draw -- I just cutoff all searching of that subtree and return db draw score immediately to the parent node.

-- Ed

TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Post by TAILLE » Sun Jun 01, 2008 20:36

Ed Gilbert wrote: I tried your two example positions using kingsrow with a 6pc database, and in both cases it finds the correct move to draw after searching for 1 second or less. But of course this could just be luck, as kingsrow certainly does not give an eval score of db draw on these position, and in general your strategy might help it win some games. I guess you have to look at how much overhead the algorithm adds to the search. In kingsrow I do not try to optimize the case of db draw -- I just cutoff all searching of that subtree and return db draw score immediately to the parent node.
-- Ed
What do you mean by overhead ? As far as a 7 pieces position is effectively reached in a real game any program with the 7 pieces db can answer correctly without any time. If it remains some time at the clock there is no harm in calculating some traps.
Gérard

Ed Gilbert
Posts: 854
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Re: 8-9 pieces partial db

Post by Ed Gilbert » Sun Jun 01, 2008 21:25

TAILLE wrote:How have you decided which slices are interesting to generate with a partial db approach ?
I noted you choose to build 8-3131, 8-3140, 8-4040, 8-4121, 8-4130, 8-5021, 8-5030, 9-4130, 9-5021, 9-5030.
If my calculation is correct these slices correspond to 2 075 681 729 120 positions. The question is the following : are they the most interesting positions ?
I used a very simple rule. For 8 pieces, I generated all the 4x4 and 5x3 positions with 1 king or less on a side. I chose these slices because I have observed that very few computer draughts games progress to the point where a side has more than 1 king. I avoided slices where there are kings on both sides and more than 1 king on at least one side because these tend to be much larger than the slices I built.
TAILLE wrote:Why not generating the following slices : 8-0431, 8-1331, 8-2231, 8-3131, 8-0440, 8-1340, 8-2240, 8-3140, 8-4040.
For these slices the total number of positions is 1 662 008 769 200 and it seems to me as faisable as your choice.
The most important point to note is that, with this approach, I will be able to continue later to generate other 8-pieces slices (5x3) and some 9 pieces slices with a guaranted gain in the efficiency. With your approach, if you generate now the 8-0431, 8-1331, 8-2231, 8-0440, 8-1340, 8-2240 you cannot escape to regenerate the 8-3131 slice and the following one which is very harmful.
IMO 8-0431, 8-1331, 8-0440, and 8-1340 are not needed, as you will rarely encounter these in a game. You probably want to build these because you think they are needed to help resolve positions in slices with fewer kings? I dont think you need them. If you look at the statistics of resolved positions that I posted, you'll see that the percentage is quite high without using any 8-piece slices of more than 1 king on a side. You might get some benefit from 8-2231 and 8-2240, but anything with more than 2 kings is I think wasted effort. It will have minimal impact on what you can resolve in the most important slices with fewer kings. What I have noticed is that when I'm building a subdivision that is at the very fringe where I have no data for the next conversion move (i.e. 1 king and the men are on the most advanced rank), then the percentage of resolved positions is low, but if I am just several ply removed from a conversion into another slice then the percentage rises quickly. You get the greatest benefit from having information for just a very few successor plies, and that's why I think building slices with 3 or 4 kings on a side is wasted effort.

As for the 9-piece slices, I did not start out with the intention to build anything other than 9-5040. After building 9-5040, I decided that it would be worthwhile to try to increase the percentage of resolved positions, as the percentage of exactly resolved was only about 50% that first time. So I built 9-5031 and 9-4140 and then re-built 9-5040 using the more complete 9-piece dependency info, and that got the stats up to the figures I posted.
TAILLE wrote:My generation algorithm for building partial db is now quite clear in my mind and I have know to code and test it. Having in my mind to generate first the 8-0431 slice it seems to me interesting to begin by testing the generation of the 6-0411 slice (supposing that I do not have the 6 pieces db) and then the 7-0421 slice (supposing that I do not have the 7 pieces db).
In case you want to verify your own algorithm by comparing our results I intend to handle the number of L, D, W, LD, LW, W for white and black and for capture and non-capture.
Are you able to generate these counters for the 6-0411 slice ?
I tested my partial builds algorithm by doing something similar, building smaller slices for which I already had complete information, and then verifying that the partial db was consistent with the complete info db. Unfortunately I no longer have any of the smaller partial info dbs that I built for this testing. Also, depending on your algorithm, your counts may not agree exactly with mine. I remember when I first built some partial dbs I used an algorithm that was correct but was not able to completely resolve some draws by repetition. When I refined the algorithm, I was able to resolve about 1% more draws. In both cases my partial db was correct, but the counts were different.

-- Ed
Last edited by Ed Gilbert on Sun Jun 01, 2008 21:59, edited 2 times in total.

Ed Gilbert
Posts: 854
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Post by Ed Gilbert » Sun Jun 01, 2008 21:32

TAILLE wrote:What do you mean by overhead ? As far as a 7 pieces position is effectively reached in a real game any program with the 7 pieces db can answer correctly without any time. If it remains some time at the clock there is no harm in calculating some traps.
Gérard
Ok, I misunderstood and thought that you applied this algorthm at any node within a normal search. From your subsequent comments I guess you only use this when the root position is 7 pieces. Wouldn't it be better to use this within the normal search, so that when your root position is still 10, 12, or 14 pieces you will guide it towards one of these draw positions with the traps for your opponent?

-- Ed

TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Re: 8-9 pieces partial db

Post by TAILLE » Sun Jun 01, 2008 22:42

Ed Gilbert wrote:IMO 8-0431, 8-1331, 8-0440, and 8-1340 are not needed, as you will rarely encounter these in a game. You probably want to build these because you think they are needed to help resolve positions in slices with fewer kings?
It is not exactly my point. Suppose that you generate 8-4040 and 8-3140 withouth having generated 8-3131. I guess you will resolve a lot a draw positions and you will find also a lot of interesting win positions. What now will be the benefit if you begin by generating the 8-3131 ? My view (but I can be completely wrong) is that the 8-3131 will essentialy resolved new draw positions and will bring only very few wins because the large majority of wins are obtained by the promotion of another man. In other word you improve significantly the number of resolved positions but only by increasing the number of draws.
Suppose that only 1% of 8-3131 positions are winning positions. Is it really interesting to resolve 75% of draw positions but only a very little pourcentage (far less than 75%) of the winning positions ?
For the moment it is not quite clear in my mind
Ed Gilbert wrote: When I refined the algorithm, I was able to resolve about 1% more draws. In both cases my partial db was correct, but the counts were different
That is exactly my point. With a very poor and stupid algorithm you may resolve all win and loss positions that you can effectively resolved and you can mark all other positions as LW. If you do not use efficiently all information you will miss the resolution of D positions and also the resolution of LD or DW positions. The db reamains consitent but I would say that the algorithm is not correct.

Gérard

TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

db on disk

Post by TAILLE » Tue Jun 10, 2008 10:50

Hi Ed,
Now that I have the 7 pieces endgame db I have o lof of material in order to test various new formats of this db in order to optimise the use of this db. For the time being I test a format in which I subdivide my tree in small ones in order to read on the disk only a block of about 4kbytes. Supposing the whole db has a 24gb size that means you have about 6M blocks.
My question is the following : you have now the choice of building a small number of huge files (e.g. 24 files of 1Gb) or a great number of small files.
Do you have here some experience ? How many files do you open simultaneously ?
Gérard

Ed Gilbert
Posts: 854
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Re: db on disk

Post by Ed Gilbert » Tue Jun 10, 2008 14:20

Hi Gerard,
TAILLE wrote:My question is the following : you have now the choice of building a small number of huge files (e.g. 24 files of 1Gb) or a great number of small files.
Do you have here some experience ? How many files do you open simultaneously ?
I use one data file for each different combination of nbm, nbk, nwm, nwk. So for example 8-3131 is one file, 8-3140 is another, etc. That means that I have slighly more than 100 data files. I keep file handles open for each data file. To look up a position, I first get the counts of nbm, nbk, nwm, and nwk, and use those to directly index into a table that gives me the open file handle for that database. There is also an index file for each data file. The index file gives the index of the position that is stored at the first byte of each 1k block in the data file. I convert the position to an index, then do a binary search on the data in the index file to find which 1k data block has the position. I then do a linear search from the beginning of the 1k block until the position's index is reached, then decode the WLD value.

I am not clear on how your trie structure can compress to more than 1 position per byte. Can you explain how that works. You said:
Damy storage scheme is very different because I do not store value but only a list of positions. The positions are stored within a tree. Taking for example a position with only one king (a very interesting case for the 9 pieces database !) The principle is to use the first man to build the first level of the tree (that means that the root has 45 successors as a maximum), then you use the second man to build the second level of the tree (44 successors as a maximum if the 2 first men are the same color) etc up to the 8th man. We are now on the leaves of the tree and we have to described which positions of the remaining piece ( a king in my example) are concerned for being in this db. Here I use several format depending of the number of positions concerned : one byte with only 1 position, 2 bytes with 2 or 3 positions, 3 bytes with 4 positions, 4 bytes with 5 positions, 5 bytes if 6 or 7 positions and 6 bytes if 8 or more positions. In this last case it is simply a 48 bit mask (in fact I do not need really a 50 bit mask because as soon as I put already 2 pieces or more on the board it remains less than 48 squares available for the last piece.
At the interior nodes of your tree you need some kind of pointers to the successor nodes. These can be either displacements in bytes or else pointers to memory locations. In either case they would need to be 32 bits wide for the larger slices. So for example to store a position that has 4 pieces, starting at the root, you would have a few bytes for the root node data describing the possible positions of the first piece, and then for each of those you would need pointers to the nodes describing the second piece placements, etc. up to the 4th piece. At the 4th piece you have "one byte with only 1 position, 2 bytes with 2 or 3 positions, 3 bytes with 4 positions ...". I understand that this representation allows quick access to see if a position is present or not, because there are only 4 comparisons, one for each piece. But how does this result in compressing (on average) many positions per byte?

-- Ed

TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Re: db on disk

Post by TAILLE » Wed Jun 11, 2008 00:15

Hi Ed,
Ed Gilbert wrote: I use one data file for each different combination of nbm, nbk, nwm, nwk. So for example 8-3131 is one file, 8-3140 is another, etc. That means that I have slighly more than 100 data files. I keep file handles open for each data file. To look up a position, I first get the counts of nbm, nbk, nwm, and nwk, and use those to directly index into a table that gives me the open file handle for that database. There is also an index file for each data file. The index file gives the index of the position that is stored at the first byte of each 1k block in the data file. I convert the position to an index, then do a binary search on the data in the index file to find which 1k data block has the position. I then do a linear search from the beginning of the 1k block until the position's index is reached, then decode the WLD value.
That confirm the new approach I have in mind. For the 7pieces db I intend to build 78 files. 39 with black to play and 39 with white to play. Why 39 files ? 1 file for the 6 pieces (and less) db and 38 files for each 7 pieces slices.
Ed Gilbert wrote: At the interior nodes of your tree you need some kind of pointers to the successor nodes. These can be either displacements in bytes or else pointers to memory locations. In either case they would need to be 32 bits wide for the larger slices. So for example to store a position that has 4 pieces, starting at the root, you would have a few bytes for the root node data describing the possible positions of the first piece, and then for each of those you would need pointers to the nodes describing the second piece placements, etc. up to the 4th piece.
If I understand correctly you seem to have some trouble with the coding of my intermediate nodes.
Let’s take your example with the 8-3131 slice (though I do not got it).
The intermediate node will describe the position of all men and I will then have one leaf per black king position. In this leaf I code all interesting position of the remaining white king.

For each intermediate node you will find 2 fields : one describing the position of the piece (50 possible values) and one giving the size of the subtree. This size may take 1 to 4 bytes depending of the size of the subtree. An intermediate node is then made of 2 to 5 bytes.
Now the question is the following : are these figures a real constraint on the size of the db ?
Let’ take your own figures. The 7 pieces db is made of about 1 000 000 000 000 positions (half with white to play and half with black to play). The size of your db is about 30Gb. As a consequence, due to your compression, you need on the average 0,24 bit per position.
If you take now one of my intermediate nodes. Basically it is an entry to all the positions you can build with the black king and the white king which represents more than 2000 positions. If you take for example 2 bytes (the most common value for the low part of the tree) that means that this 16 bits is an entry for 2000 positions and the cost is then 0,008 bit per position. Even if you take 5 bytes instead of 2 the cost remains very low.
As a consequence you have only to optimise the size of the leaves.

Gérard

Ed Gilbert
Posts: 854
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Re: db on disk

Post by Ed Gilbert » Wed Jun 11, 2008 13:39

Hi Gerard,
TAILLE wrote:... and one giving the size of the subtree.
That is the piece of information that was missing. Now the structure is clear.

I was assuming that you had direct pointers or offsets to each successor node, but that is not the case. You only know the location of the first successor and the size of its subtree, so for example to visit the 20th successor you need to visit each of the 19 successor nodes preceeding it. This is repeated for each piece of the target position, until you have visited all the pieces (except the last one) and reached a leaf node.

I agree that the interior nodes are space efficient this way. But what about the leaf nodes. You described them as:
one byte with only 1 position, 2 bytes with 2 or 3 positions, 3 bytes with 4 positions, 4 bytes with 5 positions, 5 bytes if 6 or 7 positions and 6 bytes if 8 or more positions.
Take the example of a leaf node containing 45 positions. By your description the size of this leaf node is 6 bytes. It seems that the maximum compression that can be reached with this format is approximately 8 positions per byte. Is that correct?

-- Ed

TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Re: db on disk

Post by TAILLE » Wed Jun 11, 2008 15:20

Hi Ed,
Ed Gilbert wrote:Take the example of a leaf node containing 45 positions. By your description the size of this leaf node is 6 bytes. It seems that the maximum compression that can be reached with this format is approximately 8 positions per byte. Is that correct?
I understand what you mean but clearly we have not the same defintion of the word "compression"
Let's take the last intermediate node of the slice 8-3131. The intermediate node is the entry for the 44*43 = 1892 positions that can be build with the 2 remaining kings. At the maximum I can have 44 leaves (corresponding to the position of the black king), each leaf being an entry for 43 positions (the position of the remaining white king)
I have 3 fields in a leaf :
1)The identity of the square coresponding to this leaf (here the identity of the black king)
2)Information concerning the format of the leaf
3)The description of the concerned positions of the white king (1 to 6bytes)
The fieds 1 and 2 are concatenate in one byte (50 values for the identity of the square and 5 values for the format information)
That way the size of a leaf can be 0 or 2-7bytes.
In the example chosen a leaf corresponds to 43 positions. The compression is then :
0 bytes => 0/43 = 0,00 bits/position
2 bytes => 16/43 = 0,37 bits/position
...
7 bytes => 56/43 = 1,30 bits/position

When comparing our two compression algorithms you take advantage of adjacent positions with the same result and I take advantage of no existant leaves. That is really the crucial point.
For my "large" (i.e. only one lookup in the db is needed if no capture) 7 pieces db I have 23,5Gb => 0,19 bits/position.
I don't have a statistic on my tree but I imagine that the average compression of my existant (I mean not empty) leaves is something like 0,80 bits/position. That means that 75% of my leaves are empty and 25% have a compression of 0,80 for a total result of 0,20bits/position

Does it help you understanding my approach ?

Gérard

Ed Gilbert
Posts: 854
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Post by Ed Gilbert » Wed Jun 11, 2008 23:30

Hi Gerard,

Thank you for the detailed description of your format. I was not trying to be critical of the compression, only checking that I understand it correctly.

There is another format that I have tried which gives approximately 20gb for 2 - 7 pieces. It is based on Huffman codes, and the idea came from Jonathan Schaeffer. It involves using ~10,000 variable length codes to represent various lengths of consecutive WLD values. The length of the codes varies with their frequency of occurrence. The most frequent runlength of length 1 has a 2-bit code, and the longest runs have 16-bit codes. The difficulty with this approach is that the variable length codes require a lot of shift, AND, and OR twiddling to do the linear search for the target position, and that makes it slower than what I'm using now.

-- Ed

TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Post by TAILLE » Wed Jul 09, 2008 10:32

Hi,
For your information I have just finished reformatting all my endgame database. The size of the 2-7 endgame database is now 20,6 Go.
Gérard

Post Reply