unique positions for 8x8 checkers

ibid · Post by **ibid** » Thu Sep 27, 2012 03:18

A counterpart statistic to perft N, the number of lines of a given length N, is uniq N,
the number of unique positions reached after N moves. This corresponds with
http://oeis.org/A133047 where Jonathan Schaeffer provides numbers for the first
15 ply. I thought I'd add a few more:

Code: Select all

Depth         Uniq Ratio      UniqUniq     %
0                1   ---             1  100.00%
1                7  7.00             7  100.00%
2               49  7.00            49  100.00%
3              216  4.41           130   60.19%
4              805  3.73           323   40.12%
5            2,733  3.40           735   26.89%
6            9,105  3.33         1,964   21.57%
7           28,123  3.09         4,848   17.24%
8           85,340  3.03        12,142   14.23%
9          255,800  3.00        29,184   11.41%
10         768,155  3.00        69,071    8.99%
11       2,265,062  2.95       158,608    7.00%
12       6,588,759  2.91       360,744    5.48%
13      18,667,410  2.83       773,970    4.15%
14      51,448,497  2.76     1,609,395    3.13%
15     137,051,636  2.66     3,166,829    2.31%
16     353,575,584  2.58     6,084,139    1.72%
17     878,804,335  2.49    11,129,325    1.27%
18   2,113,497,469  2.40    19,829,515    0.94%
19   4,908,984,819  2.32    33,830,205    0.69%
20  11,049,271,004  2.25    56,132,767    0.51%
21  24,062,242,901  2.18    89,061,917    0.37%
22  50,742,319,631  2.11   136,358,464    0.27%

Can anyone vary any of the numbers beyond 15 ply? The column labelled "UniqUniq"
counts the positions that can only be reached in one way in that number of moves. Uniq 16
and below could be computed in memory with 8 GB; the higher ply were done in parts using
temporary files. Uniq 22 used 550 GB of temporary space; the final set of positions and
their associated frequencies compresses to 120 GB. The computation of uniq 1 to uniq 22 took
about 2 days on a phenom 1090T using a single core. Uniq 23 and 24 could be reasonably
computed with a bigger hard drive.

Another related stat is the highest frequency at a given depth:

Code: Select all

Depth    Max Freq  Count
0               1      1
1               1      7
2               1     49
3               2     86
4               4     68
5              12      8
6              36      1
7              96      2
8             272      1
9             784      1
10          2,220      1
11          7,216      1
12         23,364      1
13         85,312      1
14        349,232      1
15      1,456,024      1
16      6,053,074      1
17     22,340,040      1
18     81,217,440      1
19     81,217,440      4
20    200,821,452      1
21    981,690,468      1
22  3,291,762,816      1

For example the position

can be reached in 7216 different ways in 11 moves. All of the early ply are 12-on-12
positions like this. Clearly not something that can last indefinitely. By 18 ply we get:

which is reached in 81,217,440 different ways. This is basically the end of the road:
at 19 ply there are 4 different positions with the same 81,217,440 frequency which
corrspond to the 4 moves in the above diagram. After this, the lead is taken by 11-on-11
positions which overcome the 2 ply delay caused by the captures. At 22 ply, the most
common position is

which can be reached in 3,291,762,816 ways. (I used 64 bit intergers for this just to
be safe -- and came quite close to needing them!)

Of course a nice side effect of this computation (or perhaps main motivation for doing it)
is that one can compute deep perfts quite quickly given the set of unique positions and how
many times each occurs. One can do shallow perft 5's on each of the 50,742,319,631
positions in uniq 22 to get perft 27. This took under 44 hours on the 6-core phenom (using
all 6 cores this time!) The same set of data is being reused to compute perft 28.

The effect is to get perfect detection of transpositions through the first 22 ply. For checkers
I added a secondary hash table to detect transpositions within each 5 (or 6) ply search and
also some of the transpositions between seperate 5 ply searches.

A significant disadvantage to this approach is that there is no easy way to produce a "divide"
or any other data to help pinpoint discrepancies with other programs.

For those who made it this far, thanks for reading! While I have the uniq data sets on disk,
if there are any other statistics you'd like to see, just let me know. (Distribution of number of
men at each depth or percent of positions with captures, for example.)

-paul

AartBik · Post by **AartBik** » Thu Sep 27, 2012 05:11

ibid wrote:Of course a nice side effect of this computation (or perhaps main motivation for doing it)
is that one can compute deep perfts quite quickly given the set of unique positions and how
many times each occurs. One can do shallow perft 5's on each of the 50,742,319,631
positions in uniq 22 to get perft 27. This took under 44 hours on the 6-core phenom (using
all 6 cores this time!) The same set of data is being reused to compute perft 28.
..
A significant disadvantage to this approach is that there is no easy way to produce a "divide"
or any other data to help pinpoint discrepancies with other programs.

Paul,
I am using almost the same technique for my distributed perft() implementation, but one that also allows computing divide() in an easy way! I am actually writing a small article on this, which I will be happy to share once I get clearance.
Aart

Rein Halbersma · Post by **Rein Halbersma** » Thu Sep 27, 2012 09:04

AartBik wrote:
ibid wrote:Of course a nice side effect of this computation (or perhaps main motivation for doing it)
is that one can compute deep perfts quite quickly given the set of unique positions and how
many times each occurs. One can do shallow perft 5's on each of the 50,742,319,631
positions in uniq 22 to get perft 27. This took under 44 hours on the 6-core phenom (using
all 6 cores this time!) The same set of data is being reused to compute perft 28.
..
A significant disadvantage to this approach is that there is no easy way to produce a "divide"
or any other data to help pinpoint discrepancies with other programs.
Paul,
I am using almost the same technique for my distributed perft() implementation, but one that also allows computing divide() in an easy way! I am actually writing a small article on this, which I will be happy to share once I get clearance.
Aart

Hi Aart,

I'm guessing you are storing the unique positions with 7 frequency counts: one for each of the 7 moves in the initial position? So basically the unique-generator tracks from which initial move it reached a position. You would then do a loop over all unique positions, and compute their shallow perft() counts as per Paul's algorithm. The final step is then simply accumulating these perft counts times the frequency counts per initial move into a size-7 divide array.

Is this more or less how you do it?

Rein

Rein Halbersma · Post by **Rein Halbersma** » Thu Sep 27, 2012 09:27

ibid wrote:A counterpart statistic to perft N, the number of lines of a given length N, is uniq N,
the number of unique positions reached after N moves. This corresponds with
http://oeis.org/A133047 where Jonathan Schaeffer provides numbers for the first
15 ply. I thought I'd add a few more:
Can anyone vary any of the numbers beyond 15 ply? The column labelled "UniqUniq"
counts the positions that can only be reached in one way in that number of moves. Uniq 16
and below could be computed in memory with 8 GB; the higher ply were done in parts using
temporary files. Uniq 22 used 550 GB of temporary space; the final set of positions and
their associated frequencies compresses to 120 GB. The computation of uniq 1 to uniq 22 took
about 2 days on a phenom 1090T using a single core. Uniq 23 and 24 could be reasonably
computed with a bigger hard drive.

Hi Paul,

Really great stuff! So basically you are pre-computing the whole transposition table and then running shallow perft() searches on each entry? I'm interested to learn how you generated the unique positions. Assuming you keep them in memory in an array, how do you find/insert new positions? Through linear search + insertion at the end, or do you keep the array sorted at all times and are you using a binary search + insertion in the middle? And does your algorithm change if you move to disk-based storage?

For those who made it this far, thanks for reading! While I have the uniq data sets on disk,
if there are any other statistics you'd like to see, just let me know. (Distribution of number of
men at each depth or percent of positions with captures, for example.)

-paul

Here are some data points I'd like to know per ply:
-number of positions with kings (split up into: neither side, either side or both sides)
-number of quiescent positions (neither white nor black has a capture pending)
-number of (quiescent and non-quiescent) positions with material balance

I don't know if it is actually computable with current resources, and it's outside the scope of this topic, but one other interesting statistic would be to compute the percentage of endgame database positions that can be reached from the initial position, split up by the number of pieces. IIRC, in the One Jump Ahead book it was mentioned that the vast majority of 24-piece positions is not reachable, but I'm curious where the transition point starts where say 90% of all positions can be reached. Perhaps with some Monte Carlo type of algorithm these percentages can be established.

Rein

BertTuyt · Post by **BertTuyt** » Thu Sep 27, 2012 22:07

Many years ago (think it was during the Games Olympics) i thought about better Endgame DB's.
Think at that time we were at 6p DB, and although it was clear that the law of Moore (and alike) would bring is towards 7p and beyond, but that also 10P DB's was an order of magnitude outside the mid-term achievable.
So I though about the possibility that although there are zillions of 5m - 5m ( 5 main against 5 man) positions, maybe only a fraction of these positions were possible to get into.
See below example were both parties have 20 man.

Due to symmetry violation, this position (and many others) is not possible. However in the way we construct endgame DB's the position will be included.

Unfortunately symmetry (or other conservation laws) is broken due to captures. (think Rein could better explain it than me). So I was interested if all the 1m - 1m positions were possible (if im correct there are 5*5 + 5*45 + 45*5 + 40*39 = 1985 unique positions).
More or less surprisingly all 1985 seems to exist (even the weird ones where both players have a capture).
I need to verify, as i have somewhere the source, but if I remember well I played Monte Carlo games, with random move selection, and storing the resulting 1m 1m positions in a table.
At that time i also started with 2m - 2m, but i stopped, it took too long (although initially convergence to the end total was so slow, that i expected to see the first results.)

Maybe i need to restart this activity some day. Also in view of the self-play approach for breakthrough patterns by Michel. I could store only the 5p - 5p positions which are not draw during matches I play regularly. Later the new positions can be added to the final and compressed (which maybe works as only a small part in win/loose).

I have no clue what the factor (fraction positions possible) could be 2 5 10 100 1000 ?
I also once had a theory (simple statistics, and may be wrong), that if i would random generate a set of N positions (Monte Carlo) , and that the likelihood/probability of these positions is equal (think this assumption is already wrong) and hereafter a second set, that the Overlap ( O) of equal positions is a measure for the total possible NTot ( something like NTot = N * N / O ).
So if out of 2 sets of 1000000 positions we have 1 overlap the total number would be 10^12.
At least for N= O, the formula is valid ( NTot = N * N / O = N * N / N = N ) .............................

Bert

ibid · Post by **ibid** » Fri Sep 28, 2012 00:18

AartBik wrote:I am using almost the same technique for my distributed perft() implementation, but one that also allows computing divide() in an easy way! I am actually writing a small article on this, which I will be happy to share once I get clearance.

Hello Aart,
I was guessing you must be doing something similar. It is quite well suited to a
distributed setup. I first did perft 12 for chess in 2006 using a motley collection
of machines. My base code hasn't really changed since then: just plug in
different engines.
I'd love to see the article when the time comes. All ideas I've had for doing a
divide involve significantly increasing some combination of memory/cpu time/
disk space. I haven't been happy enough with any of them to implement them.
Thanks,
-paul

ibid · Post by **ibid** » Fri Sep 28, 2012 00:47

Hi Rein,

Rein Halbersma wrote:Really great stuff! So basically you are pre-computing the whole transposition table and then running shallow perft() searches on each entry? I'm interested to learn how you generated the unique positions. Assuming you keep them in memory in an array, how do you find/insert new positions? Through linear search + insertion at the end, or do you keep the array sorted at all times and are you using a binary search + insertion in the middle? And does your algorithm change if you move to disk-based storage?

I use a set of 256 million (or whatever your favorite power of 2 is) linked lists, using the hash
key to determine which list a position belongs in. I use a power of 2 that is high enough that
it is unusual to have a list with more than 1 or 2 positions in it, which keeps things efficient.

For larger computations, I estimate how many parts I will need in advance so each part fits
completely in memory (for example I extrapolated that uniq 22 would be roughly 50 billion and
since 250 million fit comfortably in memory, I chose to do things in 200 parts -- if you hate division
you'd want to make this 256 I imagine). One way to proceed would then be to take uniq 21,
make every move of every position, select only those where hash_key MOD 200 is 0 (use a
different hash key or at least a different part of the hash key used above), and process just those
positions. Then repeat the computation for each of the other 199 possibilities. Since each of the
200 parts would need over 6000 seconds, this is not terribly fast.

So instead, I go through the positions and write them out to 200 different files based on hash key
(plus 200 more files for the frequencies) without any attempt to detect transpositions. Sit back
and listen to the hard drive scream for a few hours. Now read them back from disk one at a time
and process the transpositions as before. Now this only takes a few minutes for each part.

Here are some data points I'd like to know per ply:
-number of positions with kings (split up into: neither side, either side or both sides)
-number of quiescent positions (neither white nor black has a capture pending)
-number of (quiescent and non-quiescent) positions with material balance

I will put together something to collect data like this once the perft 28 computation
is complete... we can at least see how many of the 24 piece positions are reachable within
22 ply... :)

-paul

World Draughts Forum

unique positions for 8x8 checkers

unique positions for 8x8 checkers

Re: unique positions for 8x8 checkers

Re: unique positions for 8x8 checkers

Re: unique positions for 8x8 checkers

Re: unique positions for 8x8 checkers

Re: unique positions for 8x8 checkers

Re: unique positions for 8x8 checkers