Damage does not have (yet) the 7 Piece Endgame Databases

Rein Halbersma · Post by **Rein Halbersma** » Wed Nov 11, 2009 07:46

BertTuyt wrote:Rein, thanks for this overview.
Liked to read the history of DB's.
What i still want to understand:

February 1999-September 1999: Stef Keetman completes 5vs2 dbs (all kings, max 2 pieces for majority side).

June 2000: Stef Keetman announces that 4vs3 computations are under way. Magazine "Dammen" by Ton Sijbrands in which his series of articles were published stops shortly thereafter.

You wrote 5vs2 DBs (all kings, max 2 pieces for majority side), what do you exactly mean here, which DB's were actually generated?

Is there evidence in the articles that (partly) the 4vs3 were generated?

Last but not least, based on your overvriew and insight in history (and articles), do you think Stef generated all 7p DB's (maybe with the exception of the 6vs1 ), or "only" a subset.

Bert

Keetman wrote about 5K vs 2K, 4K1M vs 2K and 3K2M vs 2K. At the end of the 5th article he writes that since the 3K2M vs 2K is almost completely a draw (except in some special circumstances), he didn't think doing the full 5v2 dbs would be interesting. He then writes that he had started the 4v3 computations a while ago but that it would take several months before he had anything to report. This was in June 2000.

The 4v3 endgames however, only become interesting with many men on the board, so he would have to do all the king heavy sub-dbs first. It seems reasonable to assume that he did complete the 4v3 computation (it must have been feasible or he wouldn't have started it), but it's hard to tell without any publications. Did you have any email exchange with him then about these topics?

Rein Halbersma · Post by **Rein Halbersma** » Wed Nov 11, 2009 08:09

BertTuyt wrote:Rein,

another (strange) question.
When you plot all the DB's ( 4 , 5 , 6, 7 , 8 ) with corresponding years, and you assume that this trend will continue.

What is then the expectation for the 9p and 10P, for "more or less" normal PC's (as I believe that supercomputers already have that capability).

Ed, maybe interesting if from your side you could make some comments what PC-speed (or number of cores) you need for full 9P or full 10P, with memory, HD-size, and how long it approximately would take.

Bert

There are some interesting trends (i.e. Moore's Law). In 1993, Bert van Oortmarssen used an almost top-of-the-line 50 MHz with 16Mb RAM (it was the year the first Pentium was introduced and they started at 60-66 MHz.). He needed 12Mb of disk space which then was a very small percentage of available space (I think drives of around 100-200 Mb were available). I'm sure that with the proper slicing tricks the 16Mb RAM could have been used to generate the full 5-pc dbs already in 1993.

If you look at Ed's resources, he has a 1,000 times the memory and 10,000 times the hard disk space but the 8-pc dbs have 15,000 times more positions than the 5-pc dbs. Of course, Ed also uses much more sophisticated slicing and compression techniques compared to 1993. The relative speeds in RAM and disk size evolution changes the equation of what can be done over time. Probably RAM will keep lagging disk space more and more so that slicing and compression become more important.

If you define a "more or less normal" computation to take less than a year on hardware costing less than $2k, then at this point the full 9 and 10 pc dbs are way out of range. Even the full 5v4 db is almost 5 times as big as all the 8 pc dbs. The incomplete information dbs are another story of course. Ed already has the 5v4 dbs with max. 1 king each. I think it is quite feasible to also generate the incomplete 5men v 5men 10-pc db already using today's hardware. The question is how much it will increase playing strength if you take into account the caching bottle neck.

A rough rule of thumb: if you take the 8-pc dbs of 2009 and compare that to the 5-pc dbs of 1993, the 6-pc dbs of 1998, and the full 7-pc dbs that could have been done around 2003, then that translates into a new step every 5-6 years. So the full 9-pc dbs in 2014 and the full 10-pc dbs before 2020? Who's willing to give odds?

BTW, Keetman has a similar rule of thumb and estimates that at least the full 20-pc dbs and possible the full 25-pc dbs are required to solve (the initial position in) draughts. That translates into another 60-85 years

Ed Gilbert · Post by **Ed Gilbert** » Sat Nov 14, 2009 15:07

Ed, maybe interesting if from your side you could make some comments what PC-speed (or number of cores) you need for full 9P or full 10P, with memory, HD-size, and how long it approximately would take.

The 5x4 subset is 202e12 positions, which is 12x as many as the 4x4 + 5x3 set. It took 5.56 cpu-years to build and verify the 8pc db, so you would expect that it takes 67 cpu-years to do the 5x4 db. I think it could be done in under a year using about 10 of the 8-core machines like the one I used. I'm not sure where the parallelism starts to lose efficiency. I think it should be good for at least 50 cores, but beyond that I don't know. I would also expect the compressed size of the 5x4 to be about 4.4TB. This is all manageable, but I don't have a computer that has enough ram to effectively use a full 5x4 db. As it is I rarely try to use the 4m1k vs. 4m or 5m vs. 3m1k subsets because they're so big.

-- Ed

BertTuyt · Post by **BertTuyt** » Thu Dec 24, 2009 10:59

Ed, I'm now implementing the compression algorithm for the raw 7p database data.
Before I start running the computer for days I want to check, what a typical compression ratio one can expect.

The raw data is around 300 GByte or so (did not count exactly).
In your case, what is the compressed size (I'm sure you have communicated this before, but i couldnt find the source).

Example to what size can the 5K2K database be reduced?

Another question:
I use now fixed blocks of 4K Positions, so the compressed blocksize (in Byte) is always different. In the DB i also therefore store the Index-offset for these blocks.
Another method could be to use fixed block-sizes of 4KByte so with variable number of positions.
So far I did not give it much thought, which method is the best.

Whats your ideas, approach.....

With kind regards, and happy Xmas for you and your family.

Bert

Ed Gilbert · Post by **Ed Gilbert** » Thu Dec 24, 2009 15:11

Hi Bert,

I don't have the files here at the office, but the total size for 2 - 7 pieces is 24gb. IMO it is better to store in your index files the position at the start of each fixed size block rather than the offset of each N positions. The reason is that you will most likely be caching the db in fixed size blocks, not blocks of a fixed number of positions. Let's suppose that you do it the other way, as in the Grimminck db, where you store the offset into the db of each N positions. When you go to lookup the WLD value of a position, you search your index data and find the closest positions above and below your target position for which you know the db offset. Call these positions p0 and p1. Consider the case where p0 is the offset value 4080 and p1 is the value 5030. You know your target position is somewhere between these two offsets into the db. But you are caching the data in 4096-byte blocks, and you don't know if the target is in the block of offsets 0 - 4095 or the block that starts at offset 4096. Things are much easier if you store positions in your index files instead of byte offsets.

Another thing to consider is how much space will be taken up by your indicies. There are approximately 1e12 positions in the 2 - 7 pieces databases. If you store the offset of every 4096th position, you will be storing (1e12/4096) = 244e6 positions. If you can manage to use only 32-bit numbers for these offsets, then it consumes nearly 1gb of space, and 2gb if you have to use 64-bit numbers. This is no problem on the disk, but you want to have this info in ram so you can quickly do lookups, and that is a lot of overhead. And when you build the 8-piece db, which is 16.5e12 positions, this overhead would be 16gb assuming 32-bit numbers. That is obviously too much, so you have to either cache this indexing information dynamically, or save fewer indicies by storing them at greater intervals.

-- Ed

TAILLE · Post by **TAILLE** » Thu Dec 24, 2009 16:02

Hi Bert,

The 8 pieces db is indeed very large and a compression seems mandatory with the current technology.
For the 7 pieces db it is another story and my intention in the future is to regenerate this db and avoid compressing it. Using about 400Gb on the disk is not a problem. For the ram it is a little different. I intend to use 6 or 8Gb of memory of small blocks, each block containing a fixe number of positions stored in an uncompress format (I will certainly use 3bits per position). The block entries in the memory will be handled like a hashtable in order to avoid any index table.
I do not know if it is better than a compression scheme but, at least, it is very simple!
Gérard

BertTuyt · Post by **BertTuyt** » Thu Dec 24, 2009 16:41

Ed/Gerard, do you have any thoughts regarding using a SSD (Solid State Drive), for DB access?
To my knowledge they are extremely fast ,especially for random read.

I think these days a 160GByte can be bought for below 500 Euro.
So next year an uncompressed 7p DB can be stored on 1 SSD.

Bert

BertTuyt · Post by **BertTuyt** » Thu Dec 24, 2009 16:51

Ed/Gerard, based on the 2 previous mails, i wonder if someone did any research towards compressed and non-compressed Databases.

It is true (at least for 7P DB's) that available HD's are sufficient to store the raw data (so no need basically to ommit the capture positions, if one does not compress).

If one assumes that the DB is around 300 GByte, it is also true that the DB is too big to be stored totally in Memory.

But I dont know what in the end is the fastest, a non-compressed DB with capture positions, or a compressed DB without capture positions (no capture to optimize compression).

Also in view of my previous post, does the equation change (dramatically) when we use a SSD fo read ?

Bert

TAILLE · Post by **TAILLE** » Thu Dec 24, 2009 17:36

BertTuyt wrote:Ed/Gerard, based on the 2 previous mails, i wonder if someone did any research towards compressed and non-compressed Databases.

It is true (at least for 7P DB's) that available HD's are sufficient to store the raw data (so no need basically to ommit the capture positions, if one does not compress).

If one assumes that the DB is around 300 GByte, it is also true that the DB is too big to be stored totally in Memory.

But I dont know what in the end is the fastest, a non-compressed DB with capture positions, or a compressed DB without capture positions (no capture to optimize compression).

Also in view of my previous post, does the equation change (dramatically) when we use a SSD fo read ?

Bert

Hi Bert,

Clearly I am not able to answer your question because I have not yet tested the uncompress scheme.
I do like the uncompress scheme which is very simple and far more efficient as far as only one position is concerned. The uncompress scheme is also particularly efficient for handling capture positions.
Where is the advantage of a compress scheme? When a position is not in memory and you need to read a new block on the disk then you are able to store in memory a lot of other positions which may be of interest during the tree exploration.
With an uncompress scheme you can also store in memory some other positions but not as much, and that is the point.
It is very difficult to know if this advantage compensate the efficiency of the uncompress scheme. My feeling is that the answer to this question is highly dependant of you are index calculation. I will test a new index scheme in order to put the other interested positions as near as possible from the considered position. With this improvment I suspect that the uncompress scheme will be far better but I cannot be sure.

PS : if you have let's say 32Gb of memory the best approach is to put in memory the complete compressed 7 pieces db !

Ed Gilbert · Post by **Ed Gilbert** » Thu Dec 24, 2009 20:07

Ed/Gerard, do you have any thoughts regarding using a SSD (Solid State Drive), for DB access?
To my knowledge they are extremely fast ,especially for random read.

I haven't tried them but I expect they will help, especially when the ratio of (cache ram / total db size) is small. You should also get a similar benefit by using a usb flash drive.

-- Ed

Ed Gilbert · Post by **Ed Gilbert** » Sat Dec 26, 2009 15:29

In your case, what is the compressed size (I'm sure you have communicated this before, but i couldnt find the source).

Example to what size can the 5K2K database be reduced?

For me it is 124Mb.

A few years ago I wrote a lookup driver for the Grimminck 6-piece db so I could verify my db against it. I solved the problem of not knowing whether a position is at the tail end of a 4k block or the beginning of the next one by making the block size 206 bytes larger than 4k. That index file gives the byte offset of each 1k positions. After you find the cache block containing the closest 1k multiple position that is less than or equal to your target, then the worst case is when this position is the very last byte of a cache block, the index of your target is 1023 greater than this closest 1k multiple, and all of those 1023 inbetween positions are encoded using the 5 positions per byte format.

If you want to use the same idea, if your index file has offsets of each 4096 positions then you need to add 824 extra bytes to a cache block.

-- Ed

BertTuyt · Post by **BertTuyt** » Tue Jan 12, 2010 22:17

I finished the compression of the non-compressed databases.
In total the dB size is now 36.8 GByte .
These DB's contain all the 2p - 3p - 4p- 5p- 6p , and from the 7p the 5-2 4-3 3-4 and 2-5 sub DB's (231 files in total).

I think Ed did a better compression job, there is some additional info not needed:
- A DB header with general info, size, exact count WDL counts ( in total 8 __int64 parameters)
- A Index-address block with 8Byte Index Offset's for every block of 4K Indices.
- And last but not least the DB-data, where I think compression could be more efficient.

Will now focus on the debugging/testing of the DB-handler.....

(and first issues indicate that Ed was maybe Right, maybe 4K Byte-sized blocks is better then 4K Positions Blocks)

Bert

Ed Gilbert · Post by **Ed Gilbert** » Wed Jan 13, 2010 13:23

My indexing data is in separate files from the wld data. It's different from yours in that I use only 32-bit indicies, and there are indicies for every 1kb of db data, although I do not store all this data in memory.

How did you solve the problem of determining which 4k data block a position is in? Did you add some extra bytes to the end when you read a block from disk?

-- Ed

BertTuyt · Post by **BertTuyt** » Wed Jan 13, 2010 21:33

Ed,

i (most likely) will start with an intermediate solution "to test the water"
My approach (not sure if there is any flaw in my reasoning), is to load as many DB-blocks (containing 4K compressed positions), till a 4 KByte memory-block is occupied.
I know this is not optimal memory use, but at least it works ( i guess).

Bert

Ed Gilbert · Post by **Ed Gilbert** » Thu Jan 14, 2010 02:31

Bert, that should work ok. Load as many complete 4k position blocks as will completely fit in a 4k cache block, and the next block of positions can start at the beginning of the next cache block. That is slightly less wasteful that what I did as you will probably have on average less than 100 bytes of unused memory at the end of each cache block.

If I assume that 2GB of your 36.8GB size is indexing data, then that means that your compression ratio is 27.4 positions per byte, and so an average size for 4096 positions is 149 bytes.

-- Ed

World Draughts Forum

Damage does not have (yet) the 7 Piece Endgame Databases

Re: Damage does not have (yet) the 7 Piece Endgame Databases

Re: Damage does not have (yet) the 7 Piece Endgame Databases

Re: Damage does not have (yet) the 7 Piece Endgame Databases

Re: Damage does not have (yet) the 7 Piece Endgame Databases

Re: Damage does not have (yet) the 7 Piece Endgame Databases

Re: Damage does not have (yet) the 7 Piece Endgame Databases

Re: Damage does not have (yet) the 7 Piece Endgame Databases

Re: Damage does not have (yet) the 7 Piece Endgame Databases

Re: Damage does not have (yet) the 7 Piece Endgame Databases

Re: Damage does not have (yet) the 7 Piece Endgame Databases

Re: Damage does not have (yet) the 7 Piece Endgame Databases

Re: Damage does not have (yet) the 7 Piece Endgame Databases

Re: Damage does not have (yet) the 7 Piece Endgame Databases

Re: Damage does not have (yet) the 7 Piece Endgame Databases

Re: Damage does not have (yet) the 7 Piece Endgame Databases