EndGame Databases

Rein Halbersma · Post by **Rein Halbersma** » Wed Jun 25, 2008 21:44

Rein Halbersma wrote:After better reading the contributions of Bert and Gérard, ghost fields are even more clever than I thought. They have a dual purpose: have uniform shifts for left and right moving AND remove the need for edge detection.

The reason: they appear on *both* edges of the board like this:
Code: Select all
{
      00, 01, 02, 03, 04,
    05, 06, 07, 08, 09, [10]
[10]  11, 12, 13, 14, 15,
    16, 17, 18, 19, 20, [21]
[21]  22, 23, 24, 25, 26,
    27, 28, 29, 30, 31, [32]
[32]  33, 34, 35, 36, 37,
    38, 39, 40, 41, 42, [43]
[43]  44, 45, 46, 47, 48,
    49, 50, 51, 52, 53
}  
Just a question for historical reasons: who invented this, and when? Or did you invent this independently? And in which forum was this first discussed? I couldn't find anything with Google.

Rein

For those who hadn't got them, here are the seminal papers that founded the field of checkers programming by the great IBM scientist Arthur L. Samuel (from 1959 and 1967 respectively!)

http://www.research.ibm.com/journal/rd/ ... d0303B.pdf
http://www.research.ibm.com/journal/rd/ ... d1106C.pdf

Look at figure A-1 in the appendix of the first paper: here is his 35-bit bitboard layout with 3 ghost squares! For years, IBM had a line of 36-bit machines that allowed Samuel to implement his clever bitboard scheme.

Rein

BertTuyt · Post by **BertTuyt** » Mon Nov 10, 2008 03:08

Just a though, but I don't have a clear answer.
How do the programmers think about the benefits of SSD (Solid-State-Drives) for storing endgame databases in the very near future.

Guess with the actual developments, a 7-man database would be feasible, i don't know if a 8 piece database is already possible.
But does some-one has an idea (also based on the projected developments in access-speed) if this is beneficial for a program.

Or is the endgame caching table so efficient, that it does not make a major difference?

Bert

BertTuyt · Post by **BertTuyt** » Sat May 02, 2009 20:41

7 Piece-Database generation is now running on 2 machines (but still on every machine 1 instance).

So question for all (but especially Ed/Gerard), is there an issue when I run multiple instances of the DBGenerator?

Also in some previous threads, it was stated that in a parallel processing environment (so multiple threads for the search), there could be issues?

In the past with 6-databases I stored all DB's in memory, and therefore I did not had any problems with parallel processes.

Im not sure if this is still the case with memory-mapped files?

Hope some-one gives a clue ?

Bert

64_bit_checkers_engine · Sat May 02, 2009 21:05

Rein Halbersma wrote: Look at figure A-1 in the appendix of the first paper: here is his 35-bit bitboard layout with 3 ghost squares! For years, IBM had a line of 36-bit machines that allowed Samuel to implement his clever bitboard scheme.

The Samuel layout works great for checker moves, but not jumps. For example, jumping from 3 to 13 is a delta of 10. So, testing for another offset of 10, 13 to 23, needs to be handled. Otherwise, you could transport the checker from square 13 to 23.

Rein Halbersma · Post by **Rein Halbersma** » Sun May 03, 2009 01:20

64_bit_checkers_engine wrote:
Rein Halbersma wrote: Look at figure A-1 in the appendix of the first paper: here is his 35-bit bitboard layout with 3 ghost squares! For years, IBM had a line of 36-bit machines that allowed Samuel to implement his clever bitboard scheme.
The Samuel layout works great for checker moves, but not jumps. For example, jumping from 3 to 13 is a delta of 10. So, testing for another offset of 10, 13 to 23, needs to be handled. Otherwise, you could transport the checker from square 13 to 23.

No, that's why square 18 is missing! You keep a constant bitboard of "ghost" squares that are neither empty nor occupied. That way, you are guaranteed that no captures 3 to 23 are ever possible. The ghost board is only needed whenever you update the bitboard with unoccupied squares, like this e.g.

Code: Select all

not_occupied = ~(pieces[BLACK] ^ pieces[WHITE] ^ Board::GHOSTS);

64_bit_checkers_engine · Sun May 03, 2009 06:03

Mine is set up like this:

Code: Select all

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
X----------X----------X----------X----------X----------X----------X----------X----------X----------X----------X
X----------X----------X----------X----------X----------X----------X----------X----------X----------X----------X
X----------X----------X----------X----------X----------X----------X----------X----------X----------X----------X
X----------X----------X----------X----------X----------X----------X----------X----------X----------X----------X
X----------X----------X----------X----------X----------X----------X----------X----------X----------X----------X
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
X----------X##########X          X##########X          X##########X          X##########X          X----------X
X----------X##########X          X##########X          X##########X          X##########X          X----------X
X----------X##########X    40    X##########X    48    X##########X    56    X##########X    64    X----------X
X----------X##########X          X##########X          X##########X          X##########X          X----------X
X----------X##########X          X##########X          X##########X          X##########X          X----------X
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
X----------X          X##########X          X##########X          X##########X          X##########X----------X
X----------X          X##########X          X##########X          X##########X          X##########X----------X
X----------X    31    X##########X    39    X##########X    47    X##########X    55    X##########X----------X
X----------X          X##########X          X##########X          X##########X          X##########X----------X
X----------X          X##########X          X##########X          X##########X          X##########X----------X
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
X----------X##########X          X##########X          X##########X          X##########X          X----------X
X----------X##########X          X##########X          X##########X          X##########X          X----------X
X----------X##########X    30    X##########X    38    X##########X    46    X##########X    54    X----------X
X----------X##########X          X##########X          X##########X          X##########X          X----------X
X----------X##########X          X##########X          X##########X          X##########X          X----------X
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
X----------X          X##########X          X##########X          X##########X          X##########X----------X
X----------X          X##########X          X##########X          X##########X          X##########X----------X
X----------X    21    X##########X    29    X##########X    37    X##########X    45    X##########X----------X
X----------X          X##########X          X##########X          X##########X          X##########X----------X
X----------X          X##########X          X##########X          X##########X          X##########X----------X
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
X----------X##########X          X##########X          X##########X          X##########X          X----------X
X----------X##########X          X##########X          X##########X          X##########X          X----------X
X----------X##########X    20    X##########X    28    X##########X    36    X##########X    44    X----------X
X----------X##########X          X##########X          X##########X          X##########X          X----------X
X----------X##########X          X##########X          X##########X          X##########X          X----------X
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
X----------X          X##########X          X##########X          X##########X          X##########X----------X
X----------X          X##########X          X##########X          X##########X          X##########X----------X
X----------X    11    X##########X    19    X##########X    27    X##########X    35    X##########X----------X
X----------X          X##########X          X##########X          X##########X          X##########X----------X
X----------X          X##########X          X##########X          X##########X          X##########X----------X
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
X----------X##########X          X##########X          X##########X          X##########X          X----------X
X----------X##########X          X##########X          X##########X          X##########X          X----------X
X----------X##########X    10    X##########X    18    X##########X    26    X##########X    34    X----------X
X----------X##########X          X##########X          X##########X          X##########X          X----------X
X----------X##########X          X##########X          X##########X          X##########X          X----------X
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
X----------X          X##########X          X##########X          X##########X          X##########X----------X
X----------X          X##########X          X##########X          X##########X          X##########X----------X
X----------X    01    X##########X    09    X##########X    17    X##########X    25    X##########X----------X
X----------X          X##########X          X##########X          X##########X          X##########X----------X
X----------X          X##########X          X##########X          X##########X          X##########X----------X
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
X----------X----------X----------X----------X----------X----------X----------X----------X----------X----------X
X----------X----------X----------X----------X----------X----------X----------X----------X----------X----------X
X----------X----------X----------X----------X----------X----------X----------X----------X----------X----------X
X----------X----------X----------X----------X----------X----------X----------X----------X----------X----------X
X----------X----------X----------X----------X----------X----------X----------X----------X----------X----------X
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Every square has a "failsafe" for off-the-board.

I posted my source code, it's about 19 lines of code for both the recursive jump routine and the move generation (once the #define statements were in place).

BertTuyt · Post by **BertTuyt** » Sun Jul 19, 2009 15:59

As I already mentioned in a different post, im working on a parallel implementation for my Database Generate Program.

Before going to a multithread implementation, im working on making the single-thread version as fast as possible.
For this purpose i measure the time needed to calculate the 2 databases 4k - 2K and 2K - 4K (the sub-databases are already available, so it is only the timing for these 2.

Also (although for kings only databases this is possible) i don't use mirroring and/or symmetry reduction principles (as they don't work with "man").

So far, my current timing for these 2 databases is 2367.7 seconds, on a Q6600 (again, single thread implementation).
This yield a > 200 KPositions/second score.

Basically i want to increase this to > 300 KPositions/second, before going to the multi-thread.

However i don't have any reference yet, if the "current" speed is "reasonable" and or benchmark compare with other Database Generation Programs.

So i hope Ed/Gerard and/or others could provide me with there respective timings.

Bert

TAILLE · Post by **TAILLE** » Sun Jul 19, 2009 19:32

Hi Bert,

BertTuyt wrote:As I already mentioned in a different post, im working on a parallel implementation for my Database Generate Program.

Before going to a multithread implementation, im working on making the single-thread version as fast as possible.
For this purpose i measure the time needed to calculate the 2 databases 4k - 2K and 2K - 4K (the sub-databases are already available, so it is only the timing for these 2.

Also (although for kings only databases this is possible) i don't use mirroring and/or symmetry reduction principles (as they don't work with "man").

So far, my current timing for these 2 databases is 2367.7 seconds, on a Q6600 (again, single thread implementation).
This yield a > 200 KPositions/second score.

Basically i want to increase this to > 300 KPositions/second, before going to the multi-thread.

However i don't have any reference yet, if the "current" speed is "reasonable" and or benchmark compare with other Database Generation Programs.

So i hope Ed/Gerard and/or others could provide me with there respective timings.

Bert

For Damy it took 14 minutes to generate 4K vs 2K db (I mean 4K vs 2K + 2K vs 4K) on a core 2 duo CPU, with a single thread but with my "older" move generator (not optimised at all) and by using symmetry consideration. In fact that represents about 200K positions/s. I did not test my generation process with my new move generator but, seeing that for the perf function, my new move generator allow me to gain a factor greater than 3, I am quite sure that you can easily reach 300K positions/s

BertTuyt · Post by **BertTuyt** » Sun Jul 19, 2009 20:57

Gerard,

i assume you use 2 mirror principles, which reduces the postioncount with a factor of 4.
So your 14 min scales rather well (for the time being) with my 2367.7 seconds which is around 40 minutes.....

So based on your feedback I guess my target should be 400 KPos/sec, to be really challenged [img]images/smilies/icon_smile.gif[/img]

Bert

TAILLE · Post by **TAILLE** » Sun Jul 19, 2009 21:03

BertTuyt wrote:Gerard,

i assume you use 2 mirror principles, which reduces the postioncount with a factor of 4.
So your 14 min scales rather well (for the time being) with my 2367.7 seconds which is around 40 minutes.....

So based on your feedback I guess my target should be 400 KPos/sec, to be really challenged [img]images/smilies/icon_smile.gif[/img]

Bert

Yes Bert I used effectively 2 mirror principles in order to gain a factor equal to 3.65

A target of 400KPos/s seems reasonnable to me.

BertTuyt · Post by **BertTuyt** » Sun Jul 19, 2009 21:05

Gerard,

anyway I think i will also try to incorporate the mirror principles, so we can create a Perft part II, this time for Endgame Databases.

So who will be the first amongst our peers to cross the 10 min barrier, for the 4K - 2K and 2K - 4K databases (for single-thread applications).

Keep you all posted [img]images/smilies/icon_smile.gif[/img]

Bert

TAILLE · Post by **TAILLE** » Sun Jul 19, 2009 21:22

BertTuyt wrote:Gerard,

anyway I think i will also try to incorporate the mirror principles, so we can create a Perft part II, this time for Endgame Databases.

So who will be the first amongst our peers to cross the 10 min barrier, for the 4K - 2K and 2K - 4K databases (for single-thread applications).

Keep you all posted

Bert

I am not sure it is good idea to take into account mirror principle because is an important programmation effort with a very small gain. For those who do not want to make this effort I suggest for this new Perft function on endgame db generation, to take the 3K+1M - 2K db and to try to cross 1h30' barrier.

Ed Gilbert · Post by **Ed Gilbert** » Mon Jul 20, 2009 00:31

Bert, my build speed will be slower than yours in general, and particularly for all king slices, because I compress the databases as I build, and my compression process discards capture positions. Whenever I lookup the value of a successor position that is a capture, I have to do a search which recursively visits all successors until non-capture leaf nodes are reached. Since the all kings slices are nearly 90% captures, this search is a large overhead.

You can build the smaller databases without compressing as you build, but for large dbs you need to compress. You could possibly trade off some size for speed by keeping capture positions in the compressed db. In this case the compressed db would be several times larger, but it might be manageable.

My build speed for the 5kings vs. 3kings slice was slightly over 100kpos/sec. For 4k vs. 2k it is probably a little faster, but not 2x faster. To compare apples to apples you should see what your build speed is while building compressed dbs where you do not store the captures.

-- Ed

BertTuyt · Post by **BertTuyt** » Mon Jul 20, 2009 10:39

Ed,
you are right i use non compressed databases during built-up.
And although I have 12 GigaByte Memory, this seems not be sufficient to get throughout all 7p slices.

So I have to find a solution for this.
And certainly for the 8p it is way off scale.

Basically I only need to have the captures in the first phase of the DB-Generation. In the first phase I load "as-needed" all databases with fewer pieces (due to the captures).

As I don't load compressed files, I don't do a search, but I immediately can retrieve the right database-value.
The final databases which I use for actual game play are compressed.

Bert

TAILLE · Post by **TAILLE** » Mon Jul 20, 2009 12:23

BertTuyt wrote:Ed,
you are right i use non compressed databases during built-up.
And although I have 12 GigaByte Memory, this seems not be sufficient to get throughout all 7p slices.

So I have to find a solution for this.
And certainly for the 8p it is way off scale.

Basically I only need to have the captures in the first phase of the DB-Generation. In the first phase I load "as-needed" all databases with fewer pieces (due to the captures).

As I don't load compressed files, I don't do a search, but I immediately can retrieve the right database-value.
The final databases which I use for actual game play are compressed.

Bert

For the time being I do not see your problem. During the db generation I also load "as-needed" all databases with fewer pieces and, as Ed, I use a compress fromat without the capture positions.
Any way I could manage to generate the 7 pieces db with only 4Gb memory but by using symmetry consideration.
The most difficult position (as far as memory is concerned) is the 4K-3K db : the number of positions is 9,027,760,000 and after symmetry consideration this figure became 2,622,656,400.
Even without symmetry consideration I do not see the problem. With one bit per position you need only 1,128,470 Gb of memory. I you do not want to use a algorithm which needed only one bit per position you can still afford to use 2 or 4 bits per positions wityout any memory problem.
For Damy I used 4 bits per position and I used symmetry consideration; As a consequence I needed 1,311,328,200 bytes in memory.
For the 4K-4K db the number of position is 106,076,180,000. By using only a very simple symmetry the number of position becomes 53,038,090,00 and by using an algorithm with only one bit per position you need only 6,629,761,250 bytes in memory.
By using a some more sophistcated symmetry you can even generate this db with less than 4Gb of memory.