Inspired by some discussion on the talkchess forum, I computed the perft number of 8x8 checkers for depth 24 with the same distributed implementation I used earlier to compute depths 1 to 23. Below you see the perft breakdown per move (called "divide") from the initial position for depths 22 to 24.
AartBik wrote:Inspired by some discussion on the talkchess forum, I computed the perft number of 8x8 checkers for depth 24 with the same distributed implementation I used earlier to compute depths 1 to 23. Below you see the perft breakdown per move (called "divide") from the initial position for depths 22 to 24.
When I was computing perft(22), I wanted to surprise you before you would have bettered your own record. Someone who I had confided in wrote me: "You definitely dont want to get into a perft one-upsmanship with Aart. His company has more computing resources than the governments of most countries!" I guess the next time someone like me or Murray Cash (the one on the talkchess forum who had already confirmed perft(23))wants to improve your perft numbers, this will have to be done in absolute secrecy.
BTW, did the total runtime for this computation still qualify as a "test run"? Are you allowed to give us any additional stats at all?
Rein Halbersma wrote:I guess the next time someone like me or Murray Cash (the one on the talkchess forum who had already confirmed perft(23))wants to improve your perft numbers, this will have to be done in absolute secrecy.
Thanks Rein. I hope I did not spoil anyone's fun. I was just trying to have some fun myself
Hello Aart, I can finally verify perft(24) = 34651381875296000. Although I don't have breakdown figures (I have only just added that feature).
It took 4d 10:08:37 on my laptop to compute. Next: perft(25)!
murraycash wrote:Hello Aart, I can finally verify perft(24) = 34651381875296000. Although I don't have breakdown figures (I have only just added that feature).
It took 4d 10:08:37 on my laptop to compute. Next: perft(25)!
Murray
Congratulations, Murray. You must have upgraded your hardware because the previous perft(23) also took you 4 days, IIRC. Can you give us some details on your hardware / algorithm (hash? parallelism?) BTW, unless your perft(25) computation is nearly finished, it is generally not a good idea to announce your ambitions to Aart! Best is to present him with the counts to verify
Congratulations, Murray. You must have upgraded your hardware because the previous perft(23) also took you 4 days, IIRC. Can you give us some details on your hardware / algorithm (hash? parallelism?) BTW, unless your perft(25) computation is nearly finished, it is generally not a good idea to announce your ambitions to Aart! Best is to present him with the counts to verify
Hi Rein, Hardware is HP EliteBook with Intel i7-M620 @ 2.67 Ghz. 4 cores, all used.
Algorithm is a parallel move generator with "lock-less" hash table of 4GB, described by Robert Hyatt and Tim Mann, see http://chessprogramming.wikispaces.com/ ... Hash+Table
You are right, I am only motivated to do the next perft if I can complete it in 4 days!
murraycash wrote:Hello Aart, I can finally verify perft(24) = 34651381875296000. Although I don't have breakdown figures (I have only just added that feature).
It took 4d 10:08:37 on my laptop to compute. Next: perft(25)!
Hi Murray,
Thanks for verifying my number, and congrats with the very impressive performance on a laptop!
Is there a not-so-hidden challenge in your posting?
Aart
murraycash wrote:
Hi Rein, Hardware is HP EliteBook with Intel i7-M620 @ 2.67 Ghz. 4 cores, all used.
Algorithm is a parallel move generator with "lock-less" hash table of 4GB, described by Robert Hyatt and Tim Mann, see http://chessprogramming.wikispaces.com/ ... Hash+Table
You are right, I am only motivated to do the next perft if I can complete it in 4 days!
Hi Murray,
Correct me if I'm wrong, but I think that lock-less hashing is not guaranteed to be correct for perft counts. In regular searches, almost all (and rare to begin with) read/write races on individual hash entries are filtered out by the alpha-beta mechanism, but for perft any error will certainly propagate up to the root count. Nevertheless, it appears that you got away with it (I also got away with hard hash collissions with my perft(21) confirmation and perft(22) computation, but I wonder how lucky I was then). I think having a single hash table per core is the only way to guarantee correctness (barring memory errors of course).
Correct me if I'm wrong, but I think that lock-less hashing is not guaranteed to be correct for perft counts. In regular searches, almost all (and rare to begin with) read/write races on individual hash entries are filtered out by the alpha-beta mechanism, but for perft any error will certainly propagate up to the root count. Nevertheless, it appears that you got away with it (I also got away with hard hash collissions with my perft(21) confirmation and perft(22) computation, but I wonder how lucky I was then). I think having a single hash table per core is the only way to guarantee correctness (barring memory errors of course).
Hi Rein, I am currently using hash records with 64-bit signatures but have not encountered a problematic type-1 error (key collision) yet as my figures agree with Aart. Aart said in his other post he uses collision-free hash tables, which is the better way.
But the lock-less hashing method I use does not make this situation better or worse. Type-1 collisions can happen in single threaded or multi threaded (+ lock-less hash) perft apps and can ruin the results either way.
you can avoid type-1 collisions altogether by storing the whole position in the hash table, which can be done in 96 bits without any fancy compression (97 if you need to store the side to move also).
Last edited by murraycash on Thu Sep 13, 2012 22:09, edited 1 time in total.
Long time no hear from you! Are you still working on Nemesis, or another checkers engine?
Mac Banks mentioned that you were building a 10-piece database. Did you finish building it?
Hello Ed! It's been a while! I haven't worked on a checkers engine for a long time now, since our aborted Manchester tournament. Never had time really, too busy with other things. I never finished the 10 piece DB, I was about half way through and put it on hold as it would take another 6 months of 2 computers being on 24/7. I think it could be done a lot faster now, might re-visit the project.
The perft project has been quite interesting though, seeing what can be done with limited resources, trying to squeeze every last bit out of the laptop.
Is there any serious work still ongoing on 8x8 checkers?
murraycash wrote:
But the lock-less hashing method I use does not make this situation better or worse. Type-1 collisions can happen in single threaded or multi threaded (+ lock-less hash) perft apps and can ruin the results either way.
I'm not sure if this old xor-trick (i.e. not using std::atomic_fetch_xor) is guaranteed to be portable and race-free by the C++11 Standard, or that it "only" happens to work on x86/x86-64 architectures or with a small number of threads.
you can avoid type-1 collisions altogether by storing the whole position in the hash table, which can be done in 96 bits without any fancy compression (97 if you need to store the side to move also).
yes, I know but I use 64-bit bitboards for other reasons (avoiding even/odd row code duplication). I don't need the color bit because I use one hash table for each side to move.
murraycash wrote:Hello Ed! It's been a while! I haven't worked on a checkers engine for a long time now, since our aborted Manchester tournament. Never had time really, too busy with other things. I never finished the 10 piece DB, I was about half way through and put it on hold as it would take another 6 months of 2 computers being on 24/7. I think it could be done a lot faster now, might re-visit the project.
The perft project has been quite interesting though, seeing what can be done with limited resources, trying to squeeze every last bit out of the laptop.
Is there any serious work still ongoing on 8x8 checkers?
Except for a few bug fixes I haven't worked on 8x8 checkers much at all. It seems that with large opening books and strong engines, all games are draws. For the past few years I have been working on 10x10 international draughts, which is more challenging. Opening books do not help it much, and endgame databases start helping much later in the game than in 8x8 checkers.
I agree that building the 10pc db is not now the big hurdle that it used to be. I could probably do it in 2 to 3 months now using the 2 machines that I have here (the newest one is several years old).
murraycash wrote:
But the lock-less hashing method I use does not make this situation better or worse. Type-1 collisions can happen in single threaded or multi threaded (+ lock-less hash) perft apps and can ruin the results either way.
I'm not sure if this old xor-trick (i.e. not using std::atomic_fetch_xor) is guaranteed to be portable and race-free by the C++11 Standard, or that it "only" happens to work on x86/x86-64 architectures or with a small number of threads.
Rein, I stand corrected. I agree the lock-less implementation does increase the chance of perft count errors and that for a given hash table size, increasing the number of threads accessing the hash table will further increase the chances of race conditions that could cause a random hash key (key ^ data1) ^ data2 when decoded, to be a false match.
I'll try your idea of totally separate hash tables per thread and see how performance goes.
I'll try your idea of totally separate hash tables per thread and see how performance goes.
You could also use some spinlocks for mutual exclusion with a single hashtable. A single spinlock would be bad because only one thread could access the hashtable at a time. But if you used say 100 locks, with each lock protecting 1/100th of the table nodes, then the hashtable would usually be available for every thread simultaneously.