Endgame database benchmarks

Ed Gilbert · Post by **Ed Gilbert** » Thu Jul 08, 2010 03:52

And with 8 pieces i subdivide according to the position of the most advanced piece (45*45 slices)

That is the same indexing that I used to build the 9-piece db for 8x8 checkers. It was nice because the indexing function was simple and I only used one computer. For 10 pieces I had to change to a more complex indexing scheme to make the subdivisions smaller and to allow more subdivisions to be built in parallel.

At each iteration all cores work on the same database. I think that works quite well, because each of the passes through the database can be multithreaded quite efficiently

I assume that on each pass you break the index range up into N subranges that are worked on in parallel. Do you find that it becomes very I/O bound during the passes where you are resolving captures and conversion moves, when all 4 (or 8?) cores are reading the database values of other subdivisions? Maybe if you have sufficient ram to cache enough of these other subdivisions then its not a problem.

You say that verifying takes more time than the original computing. For me, i just check for consistency within the database and that takes about half the time of the build.

I verify the same way. For the 8pc db the verify time was just about the same on average as the build time. I am verifying the compressed dbs that do not have capture positions. I assume that's what you do also? When I'm looking up the values of successor positions to do the verify, if any of those positions are captures then it requires a search to find their value and that's probably what makes the verify slow.

-- Ed

BertTuyt · Post by **BertTuyt** » Thu Jul 08, 2010 21:35

Sorry for being so quiet on this forum lately.
I'm in the midst of my "world-Tour".
Just returned from Shanghai (and the week before i was in Istanbul).
In the next weeks i will be visiting the USA twice (most likely) and Mexico.
So I'm afraid i have to wait for my holiday (start 2nd week Aug) to be somewhat more productive and creative.

I didn't install my SSD yet.
Also to my knowledge, maybe Ed can confirm, i need to change something in my code.
The endgame benchmark is polluted by Windows as Windows also use a global cache mechanism.
If I remember well i have to use a different alloc, or to change something different.
Hope Ed can help here.

With the SSD, and some small modifications I will stop for some time with DB's.
I will not enter into the 8 piece DB competition.
I will now focus on the further improvement of the evaluation function (based on the matches against Kingsrow, Truus and Flits).
As usual the lost games give the best insights (by the way the Kingsrow match over 158 games was lost, the other 2 were a win for Damage).

See you soon,

Bert

BertTuyt · Post by **BertTuyt** » Sun Jul 11, 2010 12:43

Just a question.
With the larger DB's not fitting in memory anymore, I perceived a huge drop in search-depth, due to the caching of new data (4KByte blocks).
Although I want to improve on this through a SSD (still no time to install it

), I don't believe it will solve the problem.

Although with a larger DB some nodes have better values (in comparison with the heuristic evaluation), it could be that the overall quality for the non-DB nodes reduces as the search-depth is less, with the consequence that the overall search-quality deteriorates. Basically this could lead to the paradox, that increased DB-size could even reduce the strength of the program.

Maybe some of you have already observed this.
Shortcuts could be to further improve compression (no clues however how to do it), or to avoid caching in some circumstances.
The idea is that if we have already the search-bounds, and we know the bandwidth (WDL) for a specific (subsliced)-DB, we might not even need to find the theoretical value, as it wont change the alpha-beta window.

Maybe there are other ideas here, ..........

Bert

Ed Gilbert · Post by **Ed Gilbert** » Sun Jul 11, 2010 18:48

Hi Bert,

Although with a larger DB some nodes have better values (in comparison with the heuristic evaluation), it could be that the overall quality for the non-DB nodes reduces as the search-depth is less, with the consequence that the overall search-quality deteriorates. Basically this could lead to the paradox, that increased DB-size could even reduce the strength of the program.

I believe this is true and I have seen it happen. Years ago in 8x8 checkers when I first started using an 8-piece database I was looking up every endgame position unconditionally and I found that in automated matches against Cake my program was doing worse than when I used only a 6-piece db. I developed a priority function to solve this problem. When I encounter an endgame position during a search, I calculate a priority value for it and send this along with the position to the function that will lookup the value. If the value is already in cache (so no disk I/O needed) then retrieving the value is very fast so the exact db value is returned. If the value is not in cache, then the priority value is compared to some threshold. If the priority is high enough then the block containing the position is read from disk into cache and the db value is returned. If the priority is not high enough then the db value is not retrieved from disk and instead the heuristic eval score is used. The priority function and threshold required some tuning but eventually I figured out how to productively use the large dbs even with a relatively small cache buffer.

-- Ed

BertTuyt · Post by **BertTuyt** » Sun Jul 11, 2010 19:07

Ed, interesting approach.
Do you have any stats regarding the fraction of non-cached DB-positions , for which you will use a heuristic value?
If it is more then 90% this could yield an interesting speed-up.
Remains the mystery, where the priority function is based on.
But is good to have these secrets, will fuel curiosity and creativity in the end...

Bert

Ed Gilbert · Post by **Ed Gilbert** » Mon Jul 12, 2010 13:39

Do you have any stats regarding the fraction of non-cached DB-positions , for which you will use a heuristic value?

Bert, when I was tuning the priority I monitored the cache hits and misses and the percentage of positions that have exact wld values returned vs. heuristic values. As you would expect the numbers are very dependent on the size of the cache and the position at the root of the search, anywhere from 100% db wld values down to around 20%. Tuning is difficult because there is no single metric that you can easily look at and say that is better or worse. As you increase the number of cache misses that result in loading new blocks from disk the search speed slows but you're getting more perfect eval scores and more search cutoffs.

-- Ed

BertTuyt · Post by **BertTuyt** » Thu Jul 29, 2010 00:01

I have slightly modified my (7p) databases, so that i now have a separate headerfile, containing general info related to all DB's (231), one index-file, and 231 DB-files.

My DB-cache has 256K entries (of 4 kByte each) , so 1 GByte.
I don't preload DB-files (yet).

With the benchmark tool of Ed I measured the time needed to process the 200.000 positions, when the cache is completely empty (i have also "switched off" the Windows cache system).

With my old HD, time was around 260 seconds.
With my new 80G Intel SSD 7.8 seconds !

I also measured the number of blocks read by the DB-cache which was 30384.

When the cache is initialized the test to do the lookup 100 times (for all 200.000 positions) is 15.4 seconds.

I didn't do the calculation yet, but from this one can calculate ( i guess) the average SSD 4KByte block read.

Bert

Ed Gilbert · Post by **Ed Gilbert** » Thu Jul 29, 2010 12:56

Hi Bert,

That is quite a speedup using the SSD. Using your numbers of 7.8 sec and 30384 block reads I get 256 usec per block. That's a couple of orders of magnitude better than conventional hard drives. I notice that in the Intel specs they give "Random 4 KB Reads: up to 35,000 IOPS" which is 29 usec per block. I assume that's just the drive itself and the overhead in Windows is what slows it to 256 usec.

I see Intel also has a 160gb version which is large enough to hold the most important subdivisions of the 8pc db. I will have to consider modifying the egdb driver so that the db files can be split amongst several drives/directories.

-- Ed

MichelG · Post by **MichelG** » Thu Jul 29, 2010 16:20

Ed Gilbert wrote:Hi Bert,

That is quite a speedup using the SSD. Using your numbers of 7.8 sec and 30384 block reads I get 256 usec per block. That's a couple of orders of magnitude better than conventional hard drives. I notice that in the Intel specs they give "Random 4 KB Reads: up to 35,000 IOPS" which is 29 usec per block. I assume that's just the drive itself and the overhead in Windows is what slows it to 256 usec.

I see Intel also has a 160gb version which is large enough to hold the most important subdivisions of the 8pc db. I will have to consider modifying the egdb driver so that the db files can be split amongst several drives/directories.

-- Ed

It's probably not the windows overhead that causes the difference in your calculation; you just can't compute latency from IOPS.

Have a look at the following link to see the IO performance of several drives as function of the io-queue depth: http://www.pcper.com/article.php?aid=91 ... pert&pid=7 If you press 'next page' you can also see the measured latency, which is at least about 100 microseconds for the intel.

The IOPS the manufacturer publishes is the maximum attainable under ideal circumstances. That is, with a new empty drive, while both reading and writing simultaneously, and using the optimum amount of pending io-requests.

I suspect Bert is running the test single-threaded. According to the graph, using multithreading should be about 1.5 to 2 times as fast as far as IO/second is concerned.

In any case, it is a massive speedup compared to spinning disks

Michel

BertTuyt · Post by **BertTuyt** » Thu Jul 29, 2010 22:47

Short reaction to Michel, yes i run the "Ed test" single-threaded.
Maybe i can make a parallel version in a straightforward way as the do-loop is basically optimal for multi-thread.
Maybe interesting to do , so i can test if my DB-routines are thread safe.
If someone is interested i can have a look at this during the weekend.
Also need to browse the Internet to see if there are ways to improve speed through different window settings and/or a different driver.
Anyway I'm convinced that for serious use of large DB's a SSD is a must !!!
Dont think anyone will disagree.

Bert

TAILLE · Post by **TAILLE** » Sat Aug 07, 2010 14:12

Hi Bert, Ed and others

With a "old" HD (as I have in Damy) we all know very well why it is interesting to read 4Ko every time we have to access the disk.
With an SSD, is it more interesting to read less than 4Ko ? more than 4Ko ?

What is your feeling ?

64_bit_checkers_engine · Sat Aug 07, 2010 15:19

TAILLE wrote:With an SSD, is it more interesting to read less than 4Ko ? more than 4Ko ?

Hi,

I have a 128 GB SSD on my 3.9 GHz Intel Core i7-860 (with 8 GB of DDR3 RAM). I will run a read/write test on it a little later.

As for the 4K reads, I don't think hard disk speed matters. Whenever you call fread() with less than 4K, the OS still reads 4K off of your disk.

Try this:

Create a file with random bytes, make 2 MB worth.
Call fopen() then fread(), specifying only 1 byte.
Time your result.

Next test: call fread(), specifying 1 KB.
Time your result.

Next test: call fread(), specifying 2 KB.
Time your result.

Next test: call fread(), specifying 4 KB.
Time your result.

All of your times will be the same.

Now, try this:

Next test: call fread(), specifying 4 KB + 1 byte.
Time your result.

If I am correct, it will be 2x the amount of time to do the 4 KB read. Why? It is reading the NEXT 4 KB off of the disk.

So, reading 8K = same amount of time as reading 4 KB + 1 byte.

Experiment using the fastest timer (microseconds) since the millisecond timer might be too slow.

TAILLE · Post by **TAILLE** » Sat Aug 07, 2010 18:02

64_bit_checkers_engine wrote:
TAILLE wrote:With an SSD, is it more interesting to read less than 4Ko ? more than 4Ko ?
I have a 128 GB SSD on my 3.9 GHz Intel Core i7-860 (with 8 GB of DDR3 RAM). I will run a read/write test on it a little later.

As for the 4K reads, I don't think hard disk speed matters. Whenever you call fread() with less than 4K, the OS still reads 4K off of your disk.

OK I understand that, even with SSD, a minimum of 4K will always be read.
Now I have another question.
On a "old" HD if I read 4K at the file adress 0x10000 I guess that only 4K will be effectively read. But if I read 4K at the adress 0x10001 then I suspect that 8K will be really read.
Is it the same with an SSD ? Is it possible for you to test it ?

World Draughts Forum

Endgame database benchmarks

Re: Endgame database benchmarks

Re: Endgame database benchmarks

Re: Endgame database benchmarks

Re: Endgame database benchmarks

Re: Endgame database benchmarks

Re: Endgame database benchmarks

Re: Endgame database benchmarks

Re: Endgame database benchmarks

Re: Endgame database benchmarks

Re: Endgame database benchmarks

Re: Endgame database benchmarks

Re: Endgame database benchmarks

Re: Endgame database benchmarks