Parallel search

Discussion about development of draughts in the time of computer and Internet.
BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Post by BertTuyt »

Ed, i use the game you forwarded me (hope you remember).

1. 34-30 20-25
2. 32-28 25x34
3. 40x29 ........

In the total game 10 moves & scores are different from the single-thread (normal) search.
In one instance, score is different but move is the same.
I expected moves to be different at equal score, as move-order is different.
But in most cases move & score is different.
As I don't understand this yet, and Im still not 10% convinced that the implementation is bug-free, i will go into details.
Im curious what others encountered so far.
And next to that, any root-cause analysis would be welcomed.

Bert
User avatar
FeikeBoomstra
Posts: 306
Joined: Mon Dec 19, 2005 16:48
Location: Emmen

Post by FeikeBoomstra »

This is what I observed: I am using MTD with zero window.
In the single thread situation you get the best move by looking at the best move obtained in the root node using in alphabetha:

if (z>g) node->best_move = current_move

But in case of multiple threads you are not sure any more that the root node has obtained the real best move. This has to do with the z>g condition I think now.

What I did is to check whether the upperbound and the lowerbound of the target of the best move are equal.

I tried it in computer-computer games with first single thread and then multi thread for the same depth.
Ed Gilbert
Posts: 860
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Post by Ed Gilbert »

Bert, I didn't get a chance to run the experiment today, but I will do it before Friday.

How are you making out with the 7-piece db build? I imagine you are well into it by now.

-- Ed
BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Post by BertTuyt »

Ed, unfortunately no progress.
When I encountered all kinds of problems with the parallel search, I focussed on this one as I need to have all cores.
Nevertheless, I hope to restart anytime, keep you posted.

Bert
Ed Gilbert
Posts: 860
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Post by Ed Gilbert »

Hi Bert,

I stopped the test after comparing 60 searches of each engine. Results:

7 different moves only
6 different scores only
2 different moves and scores

In the 8 total cases with different scores, the difference was the smallest possible in my eval.

I am not concerned about this at all, I think it is a normal consequence of alphabeta and searching the tree in a different move order. I have played enough games with the parallel search engine to be convinced that it does not make bad moves and it plays much better with 4 cores than 1 because of the deeper searches. Your results look somewhat similar to mine.

Have you tested how much improvement you get in time-to-depth using 4 cores vs 1?

-- Ed
BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Post by BertTuyt »

Although I still don't trust my Multi-Core implementation for 100% (in one case I get a score difference of 0.6 Man-Value), most scores and moves are all the same, so im making progress.

I started with the. PV-Split. This implementation has one major drawback (at least my implementation), in case the PV on the root-level changes only one core will be assigned. It is obvious that in the next iteration I will try to create a work-around.
Also I use still a "huge" number of additional code in the parallel routines, for debugging, and monitoring purposes, which has a slight negative impact on speed.

Nevertheless:Results on the game Ed also used for comparison, and on a Q6600 Quad-core, 4Giga and Vista64.

Speed Nodes/Sec: 3.89M (normal algorithm 1.96M)
Time to Depth: 0.58
Node Counts: 1.16

I will try to get into this strange score i only observed once, try to improve and get the modified PV-Split working.

My guess is that with small optimizations I could reach a Time to Depth of 0.5. Also I doubt if a speedup (for 4 cores) beyond 3 is realistic.

Bert
Ed Gilbert
Posts: 860
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Post by Ed Gilbert »

Bert, I realized after reading your results is that you are probably using 2 search threads where I was using 4. I imagine that using 4 parallel threads might tend to give more differences between the searches.

Your one strange search result of 0.6man different from single-threaded should certainly be looked into, but it's possible that it is ok, especially with your very agressive futility pruning in the latest Damage.

-- Ed
TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Post by TAILLE »

Hi Bert,
BertTuyt wrote:Feike,
My question what is the best approach:
- A Master-Slave where the search-thread invites idle treads to join.
- Or a approach where the idle threads themselves are looking for work, and then take the initiative.
Bert
I studied in detail the YBW procedure and as a consequence I could be able to identify some very interesting advantages but also some drawbacks:
1) It happens that a thread may effectively be put in an "idle" state. Of course it is a pity because you do not use all the CPU available.
2) In addition, the "idle" thread may disturb the others if it waits in a loop with with lock instructions or with simply the access on a "volatile" data
3) If you split a node in the low part of the tree (Ed proposal was to keep a distance from the leaves greater than 5) it could be relatively costly
4) The efficiency of the procedure depends on the quality of the tree exploration algorithm and the quality of your evaluation function.

Just a small comment on this last point. Let'us take a position in which a strong player considers that, for the next moves, you have only 1 or 2 interesting moves (all other moves are either tactical or strategical mistakes). If you are confident with your evaluation fonction then you can look at each depth at these 2 moves and you can quickly eliminate the other moves. As a consequence the tree will become very narrow but very deep. Now what about YBW in this situation ? The answer is : it does not work because only one thread works on these interestings moves.; the other threads could just eliminate quickly the bad moves.

My conclusion is the following : the YBW works very well in a game like chess were the number of potentiel good moves at each depth is large but it is less interesting in draughts, especially if your evaluation function is efficient enough to allow reducing the number of potentiel good moves at each depth.

It is a little disturbing to know that the YBW becomes less efficient when you improve your evaluation function!

Bert, your question is of course very interesting but the answer is difficult because it depends on the way you explore the tree and the way you use the evaluation fuction to eliminate bad moves as quickly as possible.

Gérard
User avatar
FeikeBoomstra
Posts: 306
Joined: Mon Dec 19, 2005 16:48
Location: Emmen

Post by FeikeBoomstra »

It isn't my observation that only one core is active if you narrow the search tree. As soon as a core is in the idle state, it is allocated for the next potential split point. Due to the very low overhead to check if a thread is available, this check is done on every level above a certain threshold. I use >4 as a threshold.
TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Post by TAILLE »

Hi Feike,
FeikeBoomstra wrote:It isn't my observation that only one core is active if you narrow the search tree. As soon as a core is in the idle state, it is allocated for the next potential split point. Due to the very low overhead to check if a thread is available, this check is done on every level above a certain threshold. I use >4 as a threshold.
I agree with you. This kind of improvement of the procedure is necessary. That means that you use not only "pure split points" but also "potential split points" in order to avoid to stay in an "idle" state.

It seems that you chose the master-slave approach. Is it true ?

Gérard
BertTuyt
Posts: 1592
Joined: Wed Sep 01, 2004 19:42

Post by BertTuyt »

Feike, just a few questions for clarification:

Do the "masters" check for idle threads, or do the "slaves" (idle threads), poll for the "most interesting" split-point.

Next to that, if several thread offer potential splitpoints and both starting sending requests, how do you guarantee that the "most interesting" split point is chosen?

Bert
User avatar
FeikeBoomstra
Posts: 306
Joined: Mon Dec 19, 2005 16:48
Location: Emmen

Post by FeikeBoomstra »

Hi Bert,

we are not talking about loosely coupled systems, it is 4 cores with shared memory. In my implementation (just copied glaurung) the master tests if there are idle threads as simple as:
for each thread:
if (!(thread.running))

So yes, the master tries to find slaves, if not available he goes to the next depth.
I don't do any qualifications of a split-point, so I do the left-most-tree first in a single thread, and after reaching sufficient distance from the leaves I start splitting.

If more masters are competing for a slave, just the first wins. So you need a lock around this action. As I suppose most of the times there is no thread available, maybe you can first have a lookup, if successful then lock and try again.

Hope this answers your questions.
User avatar
FeikeBoomstra
Posts: 306
Joined: Mon Dec 19, 2005 16:48
Location: Emmen

Post by FeikeBoomstra »

update: it is implemented this way. Checking for availability is without a lock, claiming is (of course) within the critical region.
Ed Gilbert
Posts: 860
Joined: Sat Apr 28, 2007 14:53
Real name: Ed Gilbert
Location: Morristown, NJ USA
Contact:

Post by Ed Gilbert »

BertTuyt wrote:Do the "masters" check for idle threads, or do the "slaves" (idle threads), poll for the "most interesting" split-point.

Next to that, if several thread offer potential splitpoints and both starting sending requests, how do you guarantee that the "most interesting" split point is chosen?
At the start of a new search there is only one search thread, and the others are spinning on locks waiting for work. When the search arrives at a node which has remaining depth >= some threshold (I use 4 ply) and there is more than one remaining move to be searched, then a split point is created at that node. The original search thread becomes the master of that split point. It may engage several idle threads to start searching the remaining unsearched moves from this node. In kingsrow I limit the number of threads working on a split point to 2, as this seemed to help slightly in my tests. As the threads descend into the search tree, each node visited is potentially a candidate for being another split point, if it meets the remaining depth and number of moves criteria mentioned. So a thread that was a slave at some predecessor node can also become a master at some lower node in the tree. Similarly, if the master returns to its split node and finds that there are no more moves to be searched but that the slave thread is still searching its last move, the master can become a slave at some successor position to that split node -- but only in that subtree, it cannot be a slave to any position not a successor of the original split point.

One other small point: when the search is completely idle, I set all the search threads to block waiting on a semaphore (instead of spinning on a spinlock), so they don't waste cpu resources when not searching.

-- Ed
TAILLE
Posts: 968
Joined: Thu Apr 26, 2007 18:51
Location: FRANCE

Post by TAILLE »

Hi Ed,
Ed Gilbert wrote:
BertTuyt wrote: When the search arrives at a node which has remaining depth >= some threshold (I use 4 ply) and there is more than one remaining move to be searched, then a split point is created at that node.
That was my understanding of the original procedure. Correct me if I am wrong : if each node of the tree has at a maximum 2 successors, does that mean that only one thread is running ?

Of course it is another problem to know if it could be realistic to consider that a node has no more than 2 successors. My question is only to verify my understanding.

Gérard
Post Reply