Search Algorithm

Discussion about development of draughts in the time of computer and Internet.
Post Reply
Rein Halbersma
Posts: 1722
Joined: Wed Apr 14, 2004 16:04
Contact:

Re: Search Algorithm

Post by Rein Halbersma » Thu Oct 22, 2015 09:14

Klaas van der Laan wrote:Maybe interesting: http://www.technologyreview.com/view/54 ... al-master/
This review contains a lot of hyperbole and misrepresentations. The actual technical paper is a lot more factual but also is a bit disingenuous IMO. It's a bit like showing that a solar-powered and self-driving car is more energy efficient and better at navigating than a Formula 1 car, and then not only suggesting that the remaining order of magnitude in speed difference makes the solar car close to competitive, but also that it will easily generalize to planes and trains as well. :?

The true difficulty is not finding some machine learning algorithm that gives a high-quality eval function, but to make that eval also cheap enough to be competitive in a real game engine. In that respect, the Scan eval is a much more impressive result than Giraffe, even though the latter uses more advanced learning techniques than the former.

User avatar
Klaas van der Laan
Posts: 898
Joined: Wed Sep 24, 2003 13:19
Real name: Klaas van der Laan

Re: Search Algorithm

Post by Klaas van der Laan » Wed Nov 04, 2015 10:50

Flow with the Go

BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Re: Search Algorithm

Post by BertTuyt » Sun Aug 21, 2016 22:38

Joost Buijs wrote: My evaluation function is ready, all structural scores are still zero at the moment, I have to tune them by self-play with logistic regression.
At the moment the search runs very fast, ~20 mnps on a single core, I'm afraid that it will slowdown considerably after I have added hashing, it might be better to disable hashing in quiescence this is something I still have to find out.

The plan is to have something that can play a game by the next Dutch tournament later this year.

Joost
I posted this in a previous Thread.
Im also starting to build a new Damage Engine, based on a slightly modified architecture.
I will try to do a similar thing like you, start from a minimal implementation and measure speed.
Not sure I will come close to your 20mnps (Q, what is your defintion a Node?, a new entry into the search function?). Is this 20 mnps based upon 3.6 GHZ or 4.0 GHZ?

With the new architecture I also need to find a better HashFunction.
In the old days I incrementally updated the hashvalue in a Zobrist way.
But as extracting all the info from the Bitboards is rather expensive I changed that into a new hash calculation for every new node. The old architecture was based upon 4 bitboards, now I switch to 3, so need to find an 192bit --> 64bit hash alternative.
Also I used to update the number of man/kings , white/black after every move. Now I want to based this upon a POPCOUNT().

As Less is More, I also want to remove some intellectual garbage which was included in the code during years, but where I have strong doubts now if it adds anything towards the ELO equation.

I think in first instance it is logical to focus on mnps, and in a later stage to time to depth. And when all reductions kick in (and sometimes throw away babies with the bathwater) than the only relevant thing is ELO performance differences based upon zillions of games.

Bert

Joost Buijs
Posts: 471
Joined: Wed May 04, 2016 11:45
Real name: Joost Buijs

Re: Search Algorithm

Post by Joost Buijs » Mon Aug 22, 2016 07:35

I count each move-make() as a node (captures and sliders) and I measure ~20 mnps at 3600 MHz.
I have material and structural evaluation, the scores for the structural part are still zero, I have no idea what will happen to the speed when they are filled with other values, the shape of the search-tree will be a lot different, that's for sure.
Adding hashing will slow the speed down as well, in the end I would be very happy if I can get something between 50 to 100 mnps on my 8 core machine.

BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Re: Search Algorithm

Post by BertTuyt » Mon Aug 22, 2016 21:43

Joost, I was curious how fast a ultra simple alpha beta (without any additions) implementation would be.
First results (didnt debug, or check for bugs), based upon the initial position (2.7 GHZ notebook), and recent MoveGenerator().

Code: Select all

Search(1)       N = 10     0.00 sec.    KN/sec = 0
Search(2)       N = 27     0.00 sec.    KN/sec = 0
Search(3)       N = 126    0.00 sec.    KN/sec = 0
Search(4)       N = 317    0.00 sec.    KN/sec = 0
Search(5)       N = 1170           0.00 sec.    KN/sec = 0
Search(6)       N = 3265           0.00 sec.    KN/sec = 0
Search(7)       N = 10448          0.00 sec.    KN/sec = 0
Search(8)       N = 26762          0.00 sec.    KN/sec = 0
Search(9)       N = 94798          0.00 sec.    KN/sec = 0
Search(10)      N = 238679         0.00 sec.    KN/sec = 0
Search(11)      N = 801314         0.01 sec.    KN/sec = 80131
Search(12)      N = 2072576        0.05 sec.    KN/sec = 41451
Search(13)      N = 6536929        0.13 sec.    KN/sec = 50284
Search(14)      N = 15673433       0.37 sec.    KN/sec = 42360
Search(15)      N = 51980665       0.91 sec.    KN/sec = 57121
Search(16)      N = 122112381      2.96 sec.    KN/sec = 41254
Search(17)      N = 429704280      7.55 sec.    KN/sec = 56914
Search(18)      N = 977235471     23.81 sec.    KN/sec = 41043
Search(19)      N = 3608975784    60.57 sec.    KN/sec = 59583
Search(20)      N = 8216206144   208.09 sec.    KN/sec = 39483
As for all Movegenerators the Move sequence might be different, others will yield not the same numbers (the N is this case is incremented with every AlphaBetaSearch() entry).

Below code for Evaluation and Search

Code: Select all

int Evaluation(int bTurn, BITBOARD* pbbField)
{
	int iXWhiteMan = (int)POPCNT64(pbbField[BB_WHITEPIECE] & ~pbbField[BB_KING]);
	int iXBlackMan = (int)POPCNT64(pbbField[BB_BLACKPIECE] & ~pbbField[BB_KING]);
	int iXWhiteKing = (int)POPCNT64(pbbField[BB_WHITEPIECE] & pbbField[BB_KING]);
	int iXBlackKing = (int)POPCNT64(pbbField[BB_BLACKPIECE] & pbbField[BB_KING]);

	int iScore = MAN_VALUE*(iXWhiteMan - iXBlackMan) + KING_VALUE*(iXWhiteKing - iXBlackKing);

	if (bTurn == false) iScore = -iScore;

	return (iScore);
}

int AlphaBetaSearch(PTSTACK pTDepth, int iAlpha, int iBeta, int iDepth, bool bTurn)
{
	++iXSearch;

	BITBOARD* pbbField = pTDepth->pbbField;

	if (iDepth == 0)
		return (Evaluation(bTurn, pbbField));

	pTDepth->pbbField[BB_EMPTY] = DFLD_ALL ^ (pTDepth->pbbField[BB_WHITEPIECE] | pTDepth->pbbField[BB_BLACKPIECE]);

	pTDepth->bTurn = bTurn;

	int iXMoves = pMoveGenerate128->MoveGenerate128(pTDepth);

	if (iXMoves == 0)
		return (SCORE_LOSE);

	for (int i=0; i<iXMoves; i++) {

		(pTDepth + 1)->pbbField = (pTDepth->FirstMove++)->bbField;

		int iScore = -AlphaBetaSearch(pTDepth + 1, -iBeta, -iAlpha, iDepth-1, !bTurn);

		if (iScore > iAlpha) {

			iAlpha = iScore;

			if (iScore >= iBeta)
				break;
		}
	}

	return (iAlpha);
}


Joost Buijs
Posts: 471
Joined: Wed May 04, 2016 11:45
Real name: Joost Buijs

Re: Search Algorithm

Post by Joost Buijs » Tue Aug 23, 2016 08:41

Bert,

When I remove the evaluation function from my search() it runs at ~32 mnps, this is with the MSVC compiler at 3600 MHz. With the evaluation function it runs ~20 mnps. I used the Woldouby position to measure this at depth 20.

I'm currently working on my evaluation function to determine what is the best way to calculate the indices for the structural evaluation.
In my Othello program I did in the past: I=a; I=I*3+b; I=I*3+c; etc. however this is very slow, when I use this for all the features my program slows down to ~7mnps.
So I use pext() which is very fast, but the problem with it is that it is difficult to make an index function without holes in it, so the tables will get larger which also slows things down. I just have to find the best tradeoff between the speed of index calculation and table-size.

Another thing it that x86 CPU's don't have an instruction to reverse bit order, this is very pity because I need this at several places, at the moment it seems that using a small table to do this works best.

Joost

Edit:

I streamlined the evaluation() a bit further and now the search() runs at ~24 mnps.
At the moment I call the evaluation() when the side to move has no captures, probably I have to do this when both sides have no captures and the position is completely quiet.
The evaluation function is very basic, only material + patterns, I timed it with __rdtsc() and on average it seems to take ~36 clock-cycles with some very strange shootouts I can't explain. There might be a measurement error because __rdtsc() is not a serializing instruction, something I have to examine further.

Code: Select all

eval-no. 85538 proc-cycles 18
eval-no. 85539 proc-cycles 234
eval-no. 85540 proc-cycles 81
eval-no. 85541 proc-cycles 87
eval-no. 85542 proc-cycles 84
eval-no. 85543 proc-cycles 21
eval-no. 85544 proc-cycles 18
eval-no. 85545 proc-cycles 21
eval-no. 85546 proc-cycles 87
eval-no. 85547 proc-cycles 84
eval-no. 85548 proc-cycles 78
eval-no. 85549 proc-cycles 87
eval-no. 85550 proc-cycles 72
eval-no. 85551 proc-cycles 24
eval-no. 85552 proc-cycles 21
eval-no. 85553 proc-cycles 78
eval-no. 85554 proc-cycles 78
eval-no. 85555 proc-cycles 81
eval-no. 85556 proc-cycles 87
eval-no. 85557 proc-cycles 84
eval-no. 85558 proc-cycles 81
eval-no. 85559 proc-cycles 63
eval-no. 85560 proc-cycles 84
eval-no. 85561 proc-cycles 84
eval-no. 85562 proc-cycles 21
eval-no. 85563 proc-cycles 276
eval-no. 85564 proc-cycles 18
eval-no. 85565 proc-cycles 21
eval-no. 85566 proc-cycles 27
Probably a large part of it runs from the cache, otherwise I can't explain why it runs so fast.
With a multi-processor search() the different processors will compete for the cache and memory bandwidth, I assume in that case the evaluation function will slow down considerably.
Last edited by Joost Buijs on Wed Aug 24, 2016 17:22, edited 2 times in total.

Joost Buijs
Posts: 471
Joined: Wed May 04, 2016 11:45
Real name: Joost Buijs

Re: Search Algorithm

Post by Joost Buijs » Wed Aug 24, 2016 14:26

The strange shootouts were indeed caused by the __rdtsc() instruction which gives strange results on an out of order CPU.
After switching to __rdtscp(uint32_t *aux) the result is a lot less noisy, the question remains though whether the evaluation function is really that fast or that it is somehow a fluke.

Edit:

It was indeed a fluke, after inserting the __rdtscp() instruction I didn't use the return value of the evaluation function anymore and the compiler decided to throw out this function completely. After fixing this it shows a speed more in line with expectations.
When I let it run for a longer time the speed goes up, this is probably due to caching, it averages out at ~360 cycles which is roughly 100ns. per evaluation.

Code: Select all

eval-no. 5167 proc-cycles 1071
eval-no. 5168 proc-cycles 525
eval-no. 5169 proc-cycles 972
eval-no. 5170 proc-cycles 1173
eval-no. 5171 proc-cycles 522
eval-no. 5172 proc-cycles 504
eval-no. 5173 proc-cycles 462
eval-no. 5174 proc-cycles 477
eval-no. 5175 proc-cycles 429
eval-no. 5176 proc-cycles 420
eval-no. 5177 proc-cycles 987
eval-no. 5178 proc-cycles 687
eval-no. 5179 proc-cycles 1071
eval-no. 5180 proc-cycles 606
eval-no. 5181 proc-cycles 1095
eval-no. 5182 proc-cycles 615
eval-no. 5183 proc-cycles 534
eval-no. 5184 proc-cycles 480
eval-no. 5185 proc-cycles 429

BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Re: Search Algorithm

Post by BertTuyt » Thu Aug 25, 2016 20:59

Im back from a Business trip, so hope to spent some time on Computer Draughts in the evening(s) again.
I want to make a really minimal Draughts Hub implementation based upon the Search() and Evaluation() I already shared and the MoveGenerator() we discussed in detail.
For learning purposes I will share all code on DropBox.

Keep you posted,

Bert

Joost Buijs
Posts: 471
Joined: Wed May 04, 2016 11:45
Real name: Joost Buijs

Re: Search Algorithm

Post by Joost Buijs » Fri Aug 26, 2016 06:30

BertTuyt wrote: I want to make a really minimal Draughts Hub implementation based upon the Search() and Evaluation() I already shared and the MoveGenerator() we discussed in detail.
For learning purposes I will share all code on DropBox.
This will be nice!

My DXP implementation is finished and I guess hub uses sockets as well, another possibility is using anonymous pipes or memory mapped files, of course this is not suitable for remote connections.
A couple of days ago I started to work on my search(), it depends upon other obligations whether I can work on it or not but I hope to have a basic SMP search() ready within a week or so.

I will keep you informed.

Joost

BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Re: Search Algorithm

Post by BertTuyt » Fri Aug 26, 2016 22:48

Joost Buijs wrote:
BertTuyt wrote: I want to make a really minimal Draughts Hub implementation based upon the Search() and Evaluation() I already shared and the MoveGenerator() we discussed in detail.
For learning purposes I will share all code on DropBox.
This will be nice!

My DXP implementation is finished and I guess hub uses sockets as well, another possibility is using anonymous pipes or memory mapped files, of course this is not suitable for remote connections.
A couple of days ago I started to work on my search(), it depends upon other obligations whether I can work on it or not but I hope to have a basic SMP search() ready within a week or so.

I will keep you informed.

Joost
Hub uses pipes.
Basically the protocol should be agnostic for the physical layer, but as this is a Q&D approach, let it be...

Note: Apparently this is my 1000th post in this forum.

Bert

BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Re: Search Algorithm

Post by BertTuyt » Fri Sep 02, 2016 15:09

I continued the hash discussion here, as it might be better location in the Search section.
In my case I implemented the non-incremental (so Non Zobrist) as I thought it was (most likely) as fast (or with a small time penalty) as Zobrist, but far more elegant.

I never tested either, so speed, as collision.

Now I move away from the 4 Bitboards implementation to 3, I also need to rethink the Hash code.
Next to that in my previous architecture I had a Hybrid approach as I also stored the from and to in the MOVE structure.
Now only the 3 Bitboards.
For extracting the PV (in a later stage) I will develop a somewhat slower MoveGen() with the required additional info.

Ed, could you share your Hash code (if you want), out of interest and for testing options?
I already shared my code in a previous post.

Bert

Joost Buijs
Posts: 471
Joined: Wed May 04, 2016 11:45
Real name: Joost Buijs

Re: Search Algorithm

Post by Joost Buijs » Fri Sep 02, 2016 19:23

A non Zobrist hash function will be faster most of the time because Zobrist needs many memory accesses.

I'm not sure about collisions though, with Zobrist I can let my search run for hours without finding a single collision.
For testing purposes I use a hash-table in which I store the whole position, each time I retrieve an entry from the table with a key that matches the hash and the position doesn't match I call it a collision.

If you wish I can add the hash function you posted earlier to it and check what it does, when it generates several collisions during a few seconds of search it will certainly wreck the search up.

What is the advantage of using only 3 bitboards for the position (wm, bm, kings)? In my engine I use 5 (m[2], k[2], empty) and that seems to work fine.

BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Re: Search Algorithm

Post by BertTuyt » Fri Sep 02, 2016 23:48

Joost, yes it would be interesting to test the Hash function I posted.

Bascically I use 4 bitboards, but the empty is not calculated and stored as part of the MoveGenerator().
Im not sure if 3 (so all white piece, and black piece and kings in one bitmap) is faster/slower as 4 (white/black Man/Kings) , which is what I used so far.

I guess it does not make a huge difference, but it is a little more elegant, and I wanted to reduce the number of bytes I needed to save during the MoveGen().

Bert

Ed Gilbert
Posts: 859
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Re: Search Algorithm

Post by Ed Gilbert » Sat Sep 03, 2016 02:01

My hashtable entries are only 8 bytes -- 4 bytes for the hashcode, plus various bitfields. The move is 7 bits, a move index.

A position is 3 uint64s, (black, white, king). I prefer it because it's more compact that 4 fields. If you're generating a non-incremental hashcode, it's faster to calculate it for 3 fields than 4.

Here's my hashing function:

Code: Select all

#define C1 0xd2d84a61

void calc_hashcodes(BOARD *b, int color, HASHCODES *h)
{
	int i;
	uint64_t hash, *k;

	k = (uint64_t *)b;
	hash = (-color) & C1;
	for (i = 0; i < 3; ++i) {
		hash += (k[i] << 25) - k[i];
		hash ^= (hash >> 23);
		hash += (hash << 4);
		hash ^= (hash >> 6);
		hash += (hash << 8);
		hash ^= (hash >> 10);
		hash += (hash << 26);
		hash ^= (hash >> 31);
	}
	h->hash0 = (unsigned int)hash;
	h->hash1 = (unsigned int)(hash >> 32);
}
I have a feeling that it may be possible to speed up the hashing function a little by not applying as many shifts to the king field as to the black and white fields. I think there is less entropy in the king field because it is somewhat correlated to the other 2 fields. But I haven't tested this idea.

-- Ed

Joost Buijs
Posts: 471
Joined: Wed May 04, 2016 11:45
Real name: Joost Buijs

Re: Search Algorithm

Post by Joost Buijs » Sat Sep 03, 2016 07:00

Using a 32 bit hash-key is something I haven't done for years, with an optimal 32 bit hash-function you will get on average 1 collision each 4 billion probes, in practice things will be much worse since most hash-functions are not optimal.
When an engine probes at ~70 mnps (which is feasible on modern multicore hardware) this means (best case) 1 collision each 60 seconds which can wreck up a search if there is no means to detect collisions.
Maybe a 64 bit key is a bit overdone, 48 bit could be a alternative, I have the feeling that 32 bit is a bit on the low side because it regularly gave me problems with my chess-engine.

The reason that I store the whole move in the transposition table is that I want to avoid generating moves in case the table-move generates a cutoff (which it does very often), a simple legality check of the move to avoid crashing the engine will do, with a 64 bit key this is probably not needed at all.

Joost
Last edited by Joost Buijs on Sat Sep 03, 2016 07:31, edited 1 time in total.

Post Reply