Search Algorithm

Rein Halbersma · Post by **Rein Halbersma** » Wed Oct 12, 2016 11:28

Fabien Letouzey wrote: IMO best-first search is underappreciated and often not recognised when it is used (e.g. in opening books). It was studied in the 90's but with a big mistake: scoring the leaves using eval is too costly overall. If instead one performs an alpha-beta search, the overhead becomes tolerable.

The general pattern goes like this:
Code: Select all
while true
  pick leaf
  expand leaf
  score new leaves
  back up scores
You have to choose heuristics for these steps, so the space is huge. Opening-book construction, PN-search, and MCTS are popular instances. It can also be used for analysis, as it simulates the user algorithmically. Possibly this would be the best way to solve draughts engames (wild guess).

Don't know about "best", but a similar scheme was used by Chinook: big tree in memory, PNS determines which node to expand next, and all of that in an iteration loop over a heuristic threshold.

Solving checkers https://webdocs.cs.ualberta.ca/~mmuelle ... eckers.pdf
Checkers is solved http://cling.csd.uwo.ca/cs346a/extra/Ch ... solved.pdf
Game over: black to play and draw in checkers https://ilk.uvt.nl/icga/journal/content ... -01-08.pdf

IIRC, Ed uses his opening book builder (drop-out expansion) for proving endgames. Works too.

Fabien Letouzey · Post by **Fabien Letouzey** » Wed Oct 12, 2016 11:29

TAILLE wrote:Here I have difficulties to understand because in my graph a leaf can very often be reached via different paths from the root.

Yes, so I take the smallest cost among them.

Rein Halbersma · Post by **Rein Halbersma** » Wed Oct 12, 2016 12:18

TAILLE wrote: In the situation
v1 = +50 and n1 = 500 000
v2 = +30 and n2 = 100 000
I will decide p2 the most promising move

but in the situation
v1 = +350 and n1 = 500 000
v2 = +330 and n2 = 100 000
I will continue to use p1 because with v1 = +350 I am pretty confident to find quicky the win.

In other words when v becomes high I diminsh the influence of n.
I guess it could not be solid for a mathematical point of view but at least it is in accordance to my knowledge of the game ... and it works quite well doesn'it?

Interesting trade-off! I think I disagree. v2 = +330 is also an almost decisive advantage, not that much worse than v1 = +350. Your confidence in finding a quick win for p1 seems questionable. After, you already spent 500'000 effort on it. Seems equally logical to try p2 and see if the +330 converges to a win even quicker!

Your conclusion that it work "quite well" is 100% hindsight-bias (as Fabien put it: showing the lottery winner) because 24-30 actually did win in this example, and 13-19 not. In any case, single examples can only be used as encouragement that something good is being discovered, but to fine tune it you need large samples and statistical testing. At least in the bandit literature, sqrt(ln N / n) has been shown to be optimal ("minimal regret"). sqrt(N/n) leads to too much exploration, and sqrt(1/n) to too little exploration.

I would be interested in seeing how a sqrt(ln N/n) approach would work in your example. E.g. it could be possible that 13-19 (which you showed to be the early "leader") would be refuted quicker by 24-30. I wouldn't be surprised if 24-30 would be explored even more with a UCB term compared to your current version. Of course, there's no way of knowing until you do the experiment

Also: my confidence is increasing rapidly that your algorithm is sound now!

Rein Halbersma · Post by **Rein Halbersma** » Wed Oct 12, 2016 12:41

Fabien Letouzey wrote: I get your point. I guess that it's more or less constant in the [-0.50, +0.50] range, but there is a "cumulative" effect when reaching say +2.00. In chess it's particularly visible, as there are fewer draws with large material imbalance (guess; like Joost I know nothing about draughts).

I would remove any non-linearities by mapping eval scores to an ELO scale, and uncertainty in rating.

Rein Halbersma · Post by **Rein Halbersma** » Wed Oct 12, 2016 13:13

Fabien Letouzey wrote:
Rein Halbersma wrote:About LMR, I visualize this as a very narrow wavefront, with a logarithmic shape. At least in Stockfish, the depth reduction is logarithmic in depth and move order. You can show that this implies that every move will eventually be visited if nominal depth increases without bound. So LMR is a reduction that preserves correctness of the search. That's the same property of logarithms that the UCB term of sqrt(log(t)/N) has: eventually, every "bandit" will be tried again, no matter the initial success rate.
I'm not sure I'm following for "log"; is there a property that wouldn't work with say "sqrt"?

I didn't say "log" was the unique reduction that is eventually correct

AFAICS, sublinear is enough, so also "sqrt".

Joost Buijs · Post by **Joost Buijs** » Wed Oct 12, 2016 13:42

TAILLE wrote:
Joost Buijs wrote:
TAILLE wrote:
Hi Fabien,

It seems you consider a large egdb (7 or 8p db) not that worthy which is very surprising for me.
My experience is the following: with the 6p db you can quite often resolve a 8x8 position within 10' and this is my definition for deciding we have entered the endgame phase.
With the 8p db the endgame phase now may begin with 9x9 positions witch makes a great difference doesn't it?
Let me give you an example chosen for you Fabien:

Computer Olympiade Leiden Rapid CCD 2016, Moby Dam - Scan, position after 44.40-34

Black to play

Scan, certainly the best program in the middle game phase, has played a very powerful middle game and Scan reached here a very advantageous position. Unfortunetly Scan has only the 6p db and, taking my definition, Scan is still in the middle game and has to continue to use is strong eval middle game function and play 44.13-19 ... draw
On the other hand Damy has the 8p db and Damy will be there in its endgame phase. As expected Damy is able to resolve the position and find the winning move 24-30!!

My conclusion: I encourage Fabien to not developpe the 8p db otherwise the other programs will have no chance to win a tournament
Damy must be very smart that it can see the absolute win in this position.
Indeed after 24-30 Black wins a man in 28 ply and maybe this is winning, after 36 ply with 8p egdb from Ed enabled my program still doesn't see more than 1 man advantage.
Hi Joost,

One of the most impressive variant is the following:
1...24-30 2.48-42 30x39 3.33x44 13-19 4.42-37 12-17 5.28x17 21x12 6.32-28 18-23 7.28-22 23-29 8.37-31 29-34 9.22-18 12x23 10.27-21 11.31x22 23-28 12.22x33 34-39 13.44-40 39x28 14.26-21 8-12 15.36-31 25-30 16.31-27 19-24 17.40-35 30-34 18.21-16 28-33 19.27-21 33-39 20.16-11 39-43 21.11-6 43-48 22.21-16 24-29 23.16-11 48-26 24.6-1 26-48 25.1x18 N+ (8p db)

Gerard,

I have to say I was a bit skeptical at first, but after I let my program run for a long time on this position it also finds a win in 48 ply (with a different variation though), I guess your result must be correct so I have to take back my words. The PV is extracted from the hash-table, I hope it is usable, the last moves are missing because the hash-table gets overloaded with these long searches.

When I look at the variation below I see the program sacrificing pieces now and then, this is probably to push the loss over the horizon otherwise I can't explain this.
There are other things that are strange, the variation below ends with less pieces then 8 and that should not be possible with the 8p database loaded, this is something I have to look at, I will get back at this later.

Edit:

The database omits all positions with captures in them, not only for the color on the move but as it seems for both colors, this is something I have to take into consideration. Also 6.2 and 7.1 are missing, I probably have to score an immediate win in this case, this explains some of the things I see.
To probe on every terminal node I have to continue with quiescence until captures for both sides are exhausted, this complicates things but in the end it might be better for the positional evaluation as well.

Code: Select all


info depth 48  score 10000  time 2164.80  nodes 45732618780  nps  21125518

24-30 48-42 30x39 33x44 13-19 42-37 17-22 28x17 21x12 32-28 18-23 28-22 23-29 37-32 25-30 22-17 12x21 26x17 29-34 17-12 8x17 36-31
19-23 31-26 30-35 26-21 17x26 27-22 4-9 15-10 9-14 10x28 34-40 28-23 40x49 23-18 49x21 18-13 21-3 22-17 3x21 13-9 21-3 9-4

TAILLE · Post by **TAILLE** » Wed Oct 12, 2016 20:15

Rein Halbersma wrote:
TAILLE wrote: In the situation
v1 = +50 and n1 = 500 000
v2 = +30 and n2 = 100 000
I will decide p2 the most promising move

but in the situation
v1 = +350 and n1 = 500 000
v2 = +330 and n2 = 100 000
I will continue to use p1 because with v1 = +350 I am pretty confident to find quicky the win.

In other words when v becomes high I diminsh the influence of n.
I guess it could not be solid for a mathematical point of view but at least it is in accordance to my knowledge of the game ... and it works quite well doesn'it?
Interesting trade-off! I think I disagree. v2 = +330 is also an almost decisive advantage, not that much worse than v1 = +350. Your confidence in finding a quick win for p1 seems questionable. After, you already spent 500'000 effort on it. Seems equally logical to try p2 and see if the +330 converges to a win even quicker!

Your conclusion that it work "quite well" is 100% hindsight-bias (as Fabien put it: showing the lottery winner) because 24-30 actually did win in this example, and 13-19 not. In any case, single examples can only be used as encouragement that something good is being discovered, but to fine tune it you need large samples and statistical testing. At least in the bandit literature, sqrt(ln N / n) has been shown to be optimal ("minimal regret"). sqrt(N/n) leads to too much exploration, and sqrt(1/n) to too little exploration.

I would be interested in seeing how a sqrt(ln N/n) approach would work in your example. E.g. it could be possible that 13-19 (which you showed to be the early "leader") would be refuted quicker by 24-30. I wouldn't be surprised if 24-30 would be explored even more with a UCB term compared to your current version. Of course, there's no way of knowing until you do the experiment

Also: my confidence is increasing rapidly that your algorithm is sound now!

Hi Rein,

First of all I am happy to see your confidence in Damy increasing.

Let me explain why I take v into account in the handling of uncertainty.
Suppose that after a certain number of iterations of your search algorithm you find that you can gain a man. Certainly your are happy because you can expect a win but if two moves are promising (here 13-19 and 24-30) the difficulty may become very great because we know that an advantage of a man is not automatcally sufficient to win.
The problem is then the following: in a position with an advantage of one man how can we recognise a really winning position?
This question may be strange but in fact it is crucial in my point view.
Diag1

White to play
For a human no question exists, it is a win, full stop.

On the other with the following position
Diag2

white to play
A human will be very great difficulties to decide what could be the result

What about programs?
Curiously it is exactly the contrary:
In Diag1 a lot of moves may take place before the first exchange which will reach the egdb and a program has great difficulty to prove a win.
In the other hand, in Diag2 it will take far less moves to reach the egdb and the program can solve such problem.

What it is the best strategy for a program to prove a win in position like the diag1?
You see my point?
If you handle uncertainty as "usual" you will never solve the problem because you will explore for a very long time all possible moves because the value of all variations is always very near from +100 and in such situation you will choose the move which is less explorer than the others.
I have not finished the analyse of such situation but limiting exploration (against exploitation) is one possibility I will continue to test deeper.

Ed Gilbert · Post by **Ed Gilbert** » Wed Oct 12, 2016 20:49

Hi Gerard,

Your new search algorithm is very interesting, and obviously very strong at solving endgame positions. If I'm understanding it correctly, the reason is that it concentrates most of its cpu resources on the most promising moves, much more so than typical alphabeta search reductions such as lmr or ProbCut. Is that accurate, or is it that you also gain substantial efficiency by having a large part of the search tree in memory as a graph, so you don't have to repeat movegen and makemove operations at every node at each iteration?

If your search is really focused on the best moves, do you have problems with tactical positions in midgame or openings, where you might have to pitch a few men to setup a shot?

Also, I wonder how it performs against alphabeta programs in engine matches? You must have some results against programs like scan, dragon, or kingsrow. If it performs as well in opening and midgame as it does in solving endgames, it must do very well.

-- Ed

Rein Halbersma · Post by **Rein Halbersma** » Wed Oct 12, 2016 21:05

TAILLE wrote: First of all I am happy to see your confidence in Damy increasing.

I have several remaining points that I don't understand:
-how do you solve the GHI? You explained that your tree in memory stops at leaf nodes, where you do alpha-beta to find the score.
-Does your algorithm also solve GHI when there are no endgame databases?
-what happens if you let your program analyse for 24 hours? Does memory to hold the tree run out? Or is it well behaved like depth-first search?

TAILLE · Post by **TAILLE** » Wed Oct 12, 2016 22:04

Ed Gilbert wrote: Your new search algorithm is very interesting, and obviously very strong at solving endgame positions. If I'm understanding it correctly, the reason is that it concentrates most of its cpu resources on the most promising moves, much more so than typical alphabeta search reductions such as lmr or ProbCut.
-- Ed

It is only partially true: as I began to explain in my previous post Damy concentrates most of its cpu on the most promising move only for the side having a substantial advantage. For the defender and for equal positions exploration will take a large place.

Ed Gilbert wrote:Hi Gerard,
is it that you also gain substantial efficiency by having a large part of the search tree in memory as a graph, so you don't have to repeat movegen and makemove operations at every node at each iteration?
-- Ed

Let me make a very very important correction: I am not storing a TREE but a GRAPH that shows clearly when a position can be reach by several paths and when a loop may occur. I suspect I am able to gain efficiency but I am not able to prove it.

Ed Gilbert wrote: If your search is really focused on the best moves, do you have problems with tactical positions in midgame or openings, where you might have to pitch a few men to setup a shot?
-- Ed

For very difficult tactical problems I developped a special procedure I intend to use only at the root, at the immediate successeur of the root and on the very beginnig of the PV.

Ed Gilbert wrote:
Also, I wonder how it performs against alphabeta programs in engine matches? You must have some results against programs like scan, dragon, or kingsrow. If it performs as well in opening and midgame as it does in solving endgames, it must do very well.
-- Ed

No results because I am working and a new eval function based an an evaluation learning. My current evaluation is currently only based on material which is sufficient for endgame but certainly not for middle game.

TAILLE · Post by **TAILLE** » Wed Oct 12, 2016 22:43

Rein Halbersma wrote: -how do you solve the GHI? You explained that your tree in memory stops at leaf nodes, where you do alpha-beta to find the score.

When I was using a mtd-f procedure I clearly remember I had to work hard to avoid this problem. The problem was appearing due a bad use of my hash table when loops can exists. Now that I store all the positions in memory the problem, supposing I have understood it correctly, has completly diappeared, without any effort.
If my answer is not clear enough please give me an example and I'll show you how Damy works.

Rein Halbersma wrote: where you do alpha-beta to find the score.

When I extend the graph I take a leaf and I generate its successors. For the successors already in the graph (that means that the corresponding position have been already reached via another path) the evaluation of the position has already been done and I have nothing to do for the moment (later in the process the eval will be propagated).
The successors that are new leaves are simply put in a file in order to be evaluated by others threads. These other threads ignore completly the existence of the graph. They only run a simple alpha-beta procedure by building a classical tree (here it is really a TREE). At the end the threads return only the value of the position and the corresponding tree is discarded.
BTW a very important property of parallelism in Damy is the following: when Damy runs let's say 10 000 iterations, the graph generated and the eval of each node is always the same, i.e. the final result does not depend of the number of threrads. That was not the case when I was using an mtd-f procedure and I confess it is a very very interesting property in the debugging phase!

Rein Halbersma wrote: -what happens if you let your program analyse for 24 hours? Does memory to hold the tree run out? Or is it well behaved like depth-first search?

Currently I can generate about 120 000 000 positions in the graph. Generally it takes about 8 hours to reach this dimension.
In case it remains very few memory I just stop the analyse instead of launching a new iteration.

Joost Buijs · Post by **Joost Buijs** » Wed Oct 19, 2016 10:38

Fabien Letouzey wrote:
Joost Buijs wrote:It is possible that history doesn't do much because the positional evaluation table is still zero.

Any thoughts about this?
Yes: keep the history code nearby. My results are in line with Bert's: killers never brought anything, while history is very measurable.

Testing move ordering with a material-only evaluation is risky business of course. This could inflate results for killers.

Following Othello, you might want to try some form of response killers (they make more sense for blocking games). I haven't.

Fabien.

I'm still struggling to get someting useful out of the history-table, somehow the benefits don't outweigh the overhead it gives, the overhead is ~10% mainly due to the insertion-sort and the benefits are in the same ballpark.
I've tried many different schemes to update history but on average the results are more or less comparable, since I use thread-local tables I tried to merge them after each iteration and that didn't help either.
Another problem with history is that it makes the SMP search very non-deterministic, it adds a lot of noise which I don't like.
Right now I'm thinking about throwing out history completely and adding another means of sorting late moves.

Like you suggested I've added a response-killer table, this helps in reducing the branching-factor way better as (in my case) the history-table does and it hardly gives overhead.

It remains a mistery why I don't see an improvement by using history.

The next three weeks I have other commitments, after this I want to start working on tuning my evaluation function (which will be quite a lot of work since I have to build everything from scratch).

Joost

Joost Buijs · Post by **Joost Buijs** » Mon Oct 31, 2016 15:53

Out of curiosity I've added the Scan2 evaluation-function to my program to compare speed with my own evaluation-function which uses pext() to calculate the indices.
Despite the much larger tables that I have to use is the difference in speed dramatic, with the Scan2 eval() I roughly get the same speed as with Scan2 itself (around 59 mnps, probably bound by the evaluation-function) when I use my own evaluation-function which uses pext() the nodes/sec. are twice as high.
It is a pity that not all processors support this instruction yet, it will take a few years before every processor has this.

The irregularities are caused by the non deterministic SMP search, my search still doesn't use pruning like futility, probcut and LMR so the overall depth is not very good yet.

Code: Select all


    m  m  m  m  m
  m  m  m  m  m
    m  m  m  m  m
  m  m  m  m  m
    -  -  -  -  -
  -  -  -  -  -
    M  M  M  M  M
  M  M  M  M  M
    M  M  M  M  M
  M  M  M  M  M
  
calculating evaluation indices using (I *= 3 + x .....)

info depth 11  score     3  time   0.05  nodes      415604  nps   7561546  31-26 20-25 34-30 25x34 39x30 19-24
info depth 12  score     0  time   0.22  nodes     1681482  nps   7713096  31-26 17-21 26x17 12x21 37-31 21-26
info depth 13  score     3  time   0.30  nodes     3137864  nps  10579212  31-26 17-21 26x17 12x21 32-27 21x32
info depth 14  score    -1  time   0.47  nodes    11688204  nps  24972312  34-30 17-21 31-26 20-25 26x17 12x21
info depth 15  score     2  time   0.69  nodes    23579707  nps  34131911  34-30 20-25 31-26 25x34 39x30 14-20
info depth 16  score    -2  time   1.04  nodes    43340831  nps  41652155  34-30 17-21 31-26 20-25 26x17 12x21
info depth 17  score     3  time   2.18  nodes   106326809  nps  48877191  34-30 20-25 31-26 25x34 39x30 19-24
info depth 18  score    -2  time   3.30  nodes   171824197  nps  52048644  34-30 20-25 31-26 25x34 39x30 17-21
info depth 19  score     1  time   6.40  nodes   348826047  nps  54472978  34-30 17-21 31-26 20-25 26x17 12x21
info depth 20  score    -2  time  28.54  nodes  1681887727  nps  58935331  31-26 17-21 26x17 12x21 32-27 21x32
info depth 21  score     3  time  61.18  nodes  3568365387  nps  58328491  31-26 20-25 34-30 25x34 39x30 19-24
info depth 22  score     0  time  92.21  nodes  5459493940  nps  59204151  31-26 20-25 36-31 15-20 34-30 25x34
info depth 23  score     3  time 150.94  nodes  8825263225  nps  58467071  31-26 20-25 36-31 15-20 34-30 25x34


calculating evaluation indices using pext()

info depth 11  score     3  time   0.07  nodes      547658  nps   8191516  31-26 20-25 34-30 25x34 39x30 19-24
info depth 12  score     0  time   0.21  nodes     1771480  nps   8469390  31-26 17-21 26x17 12x21 37-31 19-23
info depth 13  score     3  time   0.28  nodes     3223028  nps  11465666  31-26 17-21 26x17 12x21 32-27 21x32
info depth 14  score    -1  time   0.38  nodes    12076717  nps  31899068  34-30 17-21 31-26 20-25 26x17 12x21
info depth 15  score     4  time   0.46  nodes    21924262  nps  47325365  34-30 20-25 32-28 25x34 39x30 19-24
info depth 16  score    -2  time   0.62  nodes    39103069  nps  62796116  34-30 17-21 31-26 20-25 26x17 12x21
info depth 17  score     3  time   1.10  nodes    98451692  nps  89758846  34-30 20-25 31-26 25x34 39x30 19-24
info depth 18  score    -2  time   1.68  nodes   168019309  nps  99715114  34-30 20-25 31-26 25x34 39x30 17-21
info depth 19  score     1  time   3.24  nodes   375707036  nps 116055516  34-30 17-21 30-25 21-26 40-34 12-17
info depth 20  score    -2  time  11.45  nodes  1441962080  nps 125882164  31-26 17-21 26x17 12x21 32-27 21x32
info depth 21  score     3  time  26.57  nodes  3375861721  nps 127077941  31-26 20-25 36-31 15-20 34-30 25x34
info depth 22  score     0  time  35.31  nodes  4496402705  nps 127357820  31-26 20-25 36-31 15-20 34-30 25x34
info depth 23  score     3  time  65.14  nodes  8373588082  nps 128544250  31-26 20-25 36-31 15-20 34-30 25x34

BertTuyt · Post by **BertTuyt** » Mon Oct 31, 2016 22:48

Joost, is this with your new 10-core processor, or the 8-core version.
For testing the new instruction-set options I will buy a new Intel Kaby-Lake based system in Q1 2017.
Although only 4 core, it already operates at 4.2 GHZ, so with water cooling I think 4.5 GHZ or even 5 GHZ would be possible.

Bert

BertTuyt · Post by **BertTuyt** » Mon Oct 31, 2016 23:37

I tested several implementations for the hash-table, from 1 -2 - 4 to 8 tiers / buckets.
In all tests I used the Woldouby Position, search depth of 28 Ply, and the hash table increase from 1M ( 2^20) entries (16 MByte) to 1G (2^30) entries (16 GByte).
Next to that I used for the Man Moves Forward (F) and Backward (B) Bit scans to create a different move default sort, so the options are white-black FF, FR, RF and RR.
I did not use any form of other sorting such as History. Only the hash-move when available was executed first.

In the graph below I did not include the 1-tier bucket implementation to better display the impact of 2 - 4 - 8 tiers.
I will sent in another mail all output data.

Bert

World Draughts Forum

Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm