EndGame Databases

Rein Halbersma · Post by **Rein Halbersma** » Sat Oct 10, 2009 14:26

BertTuyt wrote:Rein, other then the needs for other 10x10 variations, I don't really see the advantage of adding additional ghost squares. I browsed through me evaluation function, but nowhere I could find an advantage for this addition.

When i started , long ago with Damage, i started counting from 46-50, 36-40, so the other way around, and implemented this scheme in my bitboard.

If i have time, i still want to change this, it will not have any impact on my program, speed or whatever, but i like "internal Beauty". In case I will redo work, I will most likely start with the same schedule as Gerard uses , so square 1 is bitposition 0.

For an unknown reason i like standardization, so we are able to share more and to exchange more.

If someone is interested in my movegenerator ( in c++), feel free to ask, the only condition for sharing is that I get also the same from the other side.

With kind regards,

Bert

Yes, going back in a fully mature engine is more difficult than adopting a new layout during design. I have no idea how many horizontal eval patterns I will have in the end, but it seems like a "no regret" decision to be able to do it (especially since I can keep within 64 bits).

Ed Gilbert · Post by **Ed Gilbert** » Sat Oct 10, 2009 14:35

I do not have gaps in my board representation, it is the straightforward (bit 0, sq 1) .. (bit 49, sq 50). The big advantage to adding the gaps of course is it allows doing the even and odd rows in parallel. Sometimes I am tempted to switch, but I'm afraid it would create some bugs, and I use this move generator in a number of programs, including the endgame database generator and lookup api. It would be painful to change at this point.

-- Ed

BertTuyt · Post by **BertTuyt** » Mon Oct 12, 2009 21:49

Ed, although i did not find the wrong position in one of the slices, I already have a question when all the work is finished on 7p DB (if ever).

I have "zillions" of sub-slices, characterized by the leading man (white and black). Im not sure if it makes sense to combine all these sub-slices into 1 dB, so I end up with only the DB's describing the number of man, king for black and white.

As my 64Bit OS can handle larger files, this is most likely not an issue.
Only I don't know (yet), if this makes sense, or if this has any potential advantage.
So im interested in your thoughts.

Bert

Ed Gilbert · Post by **Ed Gilbert** » Tue Oct 13, 2009 15:03

BertTuyt wrote:Ed, although i did not find the wrong position in one of the slices, I already have a question when all the work is finished on 7p DB (if ever).

I have "zillions" of sub-slices, characterized by the leading man (white and black). Im not sure if it makes sense to combine all these sub-slices into 1 dB, so I end up with only the DB's describing the number of man, king for black and white.

As my 64Bit OS can handle larger files, this is most likely not an issue.
Only I don't know (yet), if this makes sense, or if this has any potential advantage.
So im interested in your thoughts.

Bert

Bert, I decided it is better to reindex the databases into large subidivisions described by number of each piece type. There are several advantages. It is much more convenient to deal with only a small number of files, and for each file you will have an open file handle and some memory used by Windows for the file buffer and other state information, so having thousands of open files is wasting some memory. Another reason is that you can easily get rid of the gaps in the indexing when indexing by piece types. With indexing by rank of leading checkers it requires a large table, ~30mb for an 8pc db, to have an indexing function that has no gaps. By gaps I mean indicies that do not correspond to a legal board configuration. The Grimminck indexing does have these gaps, which is not really a problem, but it is easy to create a gapless indexing function when you get rid of the complication caused by the leading ranks. The gapless indexing might give slightly better rle compression, although this is probably a small effect. Compression also improves slightly because you get to combine consecutive value runs at the beginning and ends of the smaller databases into single runs in the larger db. The only downside to reindexing by piece counts is that it takes some additional computing time to do it. Typically this goes at least 10x faster than building the original dbs. I have been doing this reindexing step with all the databases I have built (English, Italian, and 10x10 draughts), so I obviously consider the result to be worth the extra effort.

-- Ed

Ed Gilbert · Post by **Ed Gilbert** » Thu Oct 29, 2009 23:46

Bert has been building the 7pc db and comparing his WLD counts against the ones I posted here on the forum. He found a discrepency in the slice (nbm 2, nwm 2, nbk 2, nwk 1), white to move. His counts contain one more loss and one less draw than mine. The number of wins are the same. Since my 7pc counts have already been confirmed by Gerard, we knew the error must be in Bert's. He ran a self verify pass on this slice, where each position is checked to be consistent with its successors. This found no inconsistencies. He did it a second time and again found no inconsistencies. There are 41 billion positions in this slice, and he had one error, and only in the white-to-move positions. The counts for black-to-move matched perfectly. This seemed very odd, because all of Bert's counts for 2pcs through 6pcs matched. If it was a code bug, it was something that only showed up with 7pcs and only on one position in this particular slice.

To find the position in error we used a divide and conquer technique. I wrote a small test program which divides the 41 billion positions up into 41k blocks of 1M positions each and writes the WLD counts for each block to a log file. I zipped the log file and mailed it to Bert. He ran the same program, and by comparing my log file to his we found the block containing the discrepency. We modified the test program to write the value of each position in the block containing the discrepency. After compressing with WinZip this file was too large to email so I put it on a web server. Bert downloaded it and from this we found the position with the error.

After seeing the position in error, the explanation was obvious, but we had overlooked this particular source of errors. The position had a successor which is a capture position. My WLD counts do not include capture positions because my db does not store values for these. When I need the value of a capture position I do a recursive search of its successors until I can propagate a value back from non-capture successors. But Bert's db build process is a little different. During the build he saves the values of all positions, and then later discards the values of captures when he does a final compression pass. The error was in some black-to-move capture positions which did not get checked against my WLD counts. Bert ran a self verify pass of all the captures in the slice with the problem and found 4 errors.

So the obvious conclusion is that if your build process saves the values of all positions, not just the non-captures, then your verify needs to check the captures as well as the non-captures!

-- Ed

BertTuyt · Post by **BertTuyt** » Sun Aug 01, 2010 18:07

Now with the 7P DB-work finished (but never say never), i focus on parallel processing.
However i have a question to all involved.
With multi-threads there could be issues with the DB-handler.
In case there are only cache-reads i don't think , things will go wrong (as i believe me DB-routines are thread safe, or can be programmed to be).

But there are 2 situations where i feel uncomfortable:
- If there are multiple parallel disk block-reads, as i don't know to what extend the windows system can deal with that.
- If there is a parallel read-block and write-block, and the block-cache-ID is the same.
I think this will happen not often, but you never know.

Do you have experience with these issues, and how to bypass them.
The most rigid approach is to have one global DB-lock, so that only 1 thread at the time can approach the DB-handler. I think that this leads to a "dramatic" performance penalty, so i hope there are another/better solutions.

Bert

TAILLE · Post by **TAILLE** » Mon Aug 02, 2010 09:43

Hi Bert,

BertTuyt wrote: - If there are multiple parallel disk block-reads, as i don't know to what extend the windows system can deal with that.
Bert

In Damy implementation I resolve this problem by handling an exclusive access to disk. I can see it is not optimum if your hardware is able to read simultaneoulsly several blocks at different places, but at least it is very simple.

BertTuyt wrote: - If there is a parallel read-block and write-block, and the block-cache-ID is the same.
Bert

This point is very important. If my understanding is correct this point occurs when a thead needs to replace block1 in cache by block2 while another thread is reading block1 in cache. To avoid this problem I lock all the cache as soon as blocks has to be remove from it. In order to keep performance as high as possible you have of course to limit the number of such locks. For that reason, as soon as the cache is full I remove, with only one lock, 20% of the cache blocks. As you see, with a large cache this locking occurs very rarely and that is the point.

I hope it will help you

Ed Gilbert · Post by **Ed Gilbert** » Mon Aug 02, 2010 18:44

In kingsrow there are 2 places where I need locks: 1) to read a block from a file, because (among other problems) you cannot have 2 threads using the same file handle simultaneously, and 2) to update the LRU links. #1 only occurs when you have a cache miss, and it is a slow operation that takes milliseconds, so you would like to not prevent other search threads from reading WLD values from cache while you are reading a block from a file. #2 occurs very time you read from cache, and is a very quick operation for me, just a few C statements, so a lock here does not hold up other threads for very long. It is usually true that there are very many more cache hits than misses, so this is another reason that you want to be able to read values from cache in other threads while one thread is reading a disk block.

-- Ed

TAILLE · Post by **TAILLE** » Mon Aug 02, 2010 21:32

Hi Ed,

Ed Gilbert wrote:In kingsrow there are 2 places where I need locks: 1) to read a block from a file, because (among other problems) you cannot have 2 threads using the same file handle simultaneously, and 2) to update the LRU links. #1 only occurs when you have a cache miss, and it is a slow operation that takes milliseconds, so you would like to not prevent other search threads from reading WLD values from cache while you are reading a block from a file. #2 occurs very time you read from cache, and is a very quick operation for me, just a few C statements, so a lock here does not hold up other threads for very long. It is usually true that there are very many more cache hits than misses, so this is another reason that you want to be able to read values from cache in other threads while one thread is reading a disk block.

-- Ed

Don't you need a lock mechanism when a thread read the cache while another thread want to replace the corresponding block in the cache by another block ?

Ed Gilbert · Post by **Ed Gilbert** » Mon Aug 02, 2010 21:46

TAILLE wrote:Don't you need a lock mechanism when a thread read the cache while another thread want to replace the corresponding block in the cache by another block ?

Not if you read the block into a private buffer and then copy or link the buffer into the cache when the read is done. Plus the only way a another thread can "want to replace the corresponding block" is by doing a disk read, and the first thread already has the lock on disk I/O.

-- Ed

TAILLE · Post by **TAILLE** » Mon Aug 02, 2010 22:32

Ed,

Ed Gilbert wrote:
TAILLE wrote:Don't you need a lock mechanism when a thread read the cache while another thread want to replace the corresponding block in the cache by another block ?
Not if you read the block into a private buffer and then copy or link the buffer into the cache when the read is done. Plus the only way a another thread can "want to replace the corresponding block" is by doing a disk read, and the first thread already has the lock on disk I/O.

-- Ed

May be my question was not very clear. Of course I do not want to lock the access of the cache while an I/O on the disk (it never happens in Damy implementation). The only point if the following :
When a thread 1 needs to remove an entry in a cache (in order to link a new block read on the disk) which is full we have to assure that this entry is not in use by a thread 2.
To resolve this problem a lock is necessary but you have here two solutions :
In Damy implementation the lock is put in place by the thread 1 when an antry has to be removed from the cache.
If my understanding is correct, in Kingsrow the lock is put in place by the thread 2 when it updates the LRU links thus protecting (for a certain time) the entry against the removing of this entry by another thread.
As usual each solutions is its pro and cons;

Ed Gilbert · Post by **Ed Gilbert** » Tue Aug 03, 2010 04:19

Hi Gerard,

I looked at my code tonight and I was wrong about one detail, and also what I was trying to describe is not how it's implemented now. But here's another try at describing the idea.

Assume there are two locks named cache and io.

Code: Select all

lookup(position, priority)
{
	identify disk block containing position
	lock(cache)
	if (disk block is in cache) {
		decode position's WLD value
		update LRU links
		unlock(cache)
		return(value)
	}

	if (priority is HIGH) {
		lock(io)
		unlock(cache)
		read disk block into private buffer
		lock(cache)
		unlock(io)
		replace least recently used cache block with new block
		decode position's WLD value
		update LRU links
		unlock(cache)
		return(value)
	}

	unlock(cache)
	return(UNKNOWN value)
}

Do you see any problem with this scheme? The main idea is that a thread can obtain the cache lock and read values from cache while another thread has the io lock and is doing a (slow) disk read.

-- Ed

Ed Gilbert · Post by **Ed Gilbert** » Tue Aug 03, 2010 04:48

Now that I've studied the pseudocode, I see it has a potential for deadlock. One thread can have the io lock after a disk read and be waiting for the cache lock, while a second thread can have the cache lock and be waiting for the io lock.

-- Ed

Ed Gilbert · Post by **Ed Gilbert** » Tue Aug 03, 2010 10:53

Ed Gilbert wrote:Now that I've studied the pseudocode, I see it has a potential for deadlock. One thread can have the io lock after a disk read and be waiting for the cache lock, while a second thread can have the cache lock and be waiting for the io lock.

-- Ed

I think this modification should avoid the deadlock.

Code: Select all

lookup(position, priority)
{
	identify disk block containing position
	lock(cache)
	if (disk block is in cache) {
		decode position's WLD value
		update LRU links
		unlock(cache)
		return(value)
	}

	if (priority is HIGH) {
		unlock(cache)
		lock(io)
		if (disk block is still not in cache)
			read disk block into thread-local buffer
		unlock(io)
		lock(cache)
		if (disk block is still not in cache)
			replace least recently used cache block with new block
		decode position's WLD value
		update LRU links
		unlock(cache)
		return(value)
	}
	unlock(cache)
	return(UNKNOWN value)
}

-- Ed

TAILLE · Post by **TAILLE** » Tue Aug 03, 2010 12:38

That's clear Ed.

Damy implementation is completetly different because I wanted to optimise the number of cache locks.
The main point if the following : suppose that you have an infinity entries for your cache. That means that you never need to replace an old block by a new one. As a consequence you do not need to handle a LRU link and you never need to lock the cache.
That is the idea in Damy : as long as the cache is not full the cache is never locked. When the cache becomes full I lock the cache in order to remove about 25% of the cache and then I unlocked the cache until it becomes full again. As you see, in real real game it may happen that the cache is never locked !

Just a small question on your pseudo code :
-- Ed[/quote]

Code: Select all

lookup(position, priority)
{
	identify disk block containing position
	lock(cache)
	if (disk block is in cache) {
		decode position's WLD value
		update LRU links
		unlock(cache)
		return(value)
	}
}

-- Ed[/quote]

Why do you need to lock the cache during "decode position's WLD value" ? Isn'it possible to first update LRU links and unlock immediatly the cache like this :

Code: Select all

lookup(position, priority)
{
	identify disk block containing position
	lock(cache)
	if (disk block is in cache) {
		update LRU links
		unlock(cache)
		decode position's WLD value
		return(value)
	}
}

After the "update LRU links" there are practically no risk for the block in cache to be replaced by a new one before the end of "decode position's WLD value" are they ?

World Draughts Forum

EndGame Databases

db verify error puzzle

Re: EndGame Databases

Re: EndGame Databases

Re: EndGame Databases

Re: EndGame Databases

Re: EndGame Databases

Re: EndGame Databases

Re: EndGame Databases

Re: EndGame Databases

Re: EndGame Databases

Re: EndGame Databases