Search Algorithm

Fabien Letouzey · Post by **Fabien Letouzey** » Tue Oct 11, 2016 13:46

Rein Halbersma wrote:FYI, here's a link to the 2nd edition (work-in-progress) of the Reinforcement Learning book by Sutton & Barto. Apart from the general framework, there are 3 particularly interesting sections: 16.2 on Samuel's checkers program (that started the whol RL business), 16.9 on AlphaGo and 17.1 with a nice picture showing how TD-learning, Dynamic Programming, Monte Carlo and exhaustive tree search are all special limits of a general framework.

Sounds cool, but where is the link? I think I saw one on Hacker News a few days ago, but couldn't download.

TAILLE · Post by **TAILLE** » Tue Oct 11, 2016 14:27

Hi,

With the last posts of Fabien I have now a better understanding of the wording used for the search algorithms and it is now far more easier for me to explain how Damy works.
In order to have the best wording I propose to define my algorithm as a "Most Promising First" (MPF) search algorithm (No doubt for me it exist already in the large litterature but I did not yet find it).
To illustrate this MPF algorithm let's take as example the Scan position I analysed wth Damy with the two promising move 13-19 and 24-30.
Let's call rp the root position, p1 the position after 13-19 and p2 the position after 24-30.
After a certain number of iterations I suppose the evaluation of p1 is v1 and the subtree under p1 is made of n1 nodes. I noted here v1 = eval(p1, n1); in the alpha-beta world you may perhaps replace n1 by the depth d1 but I didn't analyse this in this context.
When the number of ierations grows n1 is also growing and v1 is expected to go nearer and nearer the real value of p1.
The idea of the MPF search is then the following:
if p1 is a winning position then I expect that v1 = eval(p1, n1) will tend to grow as n1 grows. If v1 grows rapidly clearly p1 is a very "promising" position. On the contrary if v1 has great difficulty to grow then p1 becomes less promising.
An example will clarify the point:
Suppose
v1 = +50 and n1 = 100 000
v2 = +30 and n2 = 100 000
clearly p1 is the most promising move and I will continue by exploring only p1
Suppose now we reach the following situation
v1 = +50 and n1 = 500 000
v2 = +30 and n2 = 100 000
p1 still corresponds to the best move (v1 > v2) but, because n1 = 5 x n2 it is not the most promising because I could expect v2 will grow when n2 grows. In this case p2 becomes the most promising position and Damy will choose this position for its next iteration.
In other word I built a function bool f(v1, v2, n1, n2) which tell me if p1 is more promising than p2, and Damy will choose the corresponding move for the next iteration.
Note : in order to not forget an apparently very bad move the f function begins by: if (n2 < n1/100) then return false;
Tell me if this helps you to see how Damy works.

Rein Halbersma · Post by **Rein Halbersma** » Tue Oct 11, 2016 15:21

Fabien Letouzey wrote:
Rein Halbersma wrote:FYI, here's a link to the 2nd edition (work-in-progress) of the Reinforcement Learning book by Sutton & Barto. Apart from the general framework, there are 3 particularly interesting sections: 16.2 on Samuel's checkers program (that started the whol RL business), 16.9 on AlphaGo and 17.1 with a nice picture showing how TD-learning, Dynamic Programming, Monte Carlo and exhaustive tree search are all special limits of a general framework.
Sounds cool, but where is the link? I think I saw one on Hacker News a few days ago, but couldn't download.

Oops, forgot to include it: http://incompleteideas.net/sutton/book/ ... 016sep.pdf

Rein Halbersma · Post by **Rein Halbersma** » Tue Oct 11, 2016 15:33

TAILLE wrote:Hi,

With the last posts of Fabien I have now a better understanding of the wording used for the search algorithms and it is now far more easier for me to explain how Damy works.
In order to have the best wording I propose to define my algorithm as a "Most Promising First" (MPF) search algorithm (No doubt for me it exist already in the large litterature but I did not yet find it).
To illustrate this MPF algorithm let's take as example the Scan position I analysed wth Damy with the two promising move 13-19 and 24-30.
Let's call rp the root position, p1 the position after 13-19 and p2 the position after 24-30.
After a certain number of iterations I suppose the evaluation of p1 is v1 and the subtree under p1 is made of n1 nodes. I noted here v1 = eval(p1, n1); in the alpha-beta world you may perhaps replace n1 by the depth d1 but I didn't analyse this in this context.
When the number of ierations grows n1 is also growing and v1 is expected to go nearer and nearer the real value of p1.
The idea of the MPF search is then the following:
if p1 is a winning position then I expect that v1 = eval(p1, n1) will tend to grow as n1 grows. If v1 grows rapidly clearly p1 is a very "promising" position. On the contrary if v1 has great difficulty to grow then p1 becomes less promising.
An example will clarify the point:
Suppose
v1 = +50 and n1 = 100 000
v2 = +30 and n2 = 100 000
clearly p1 is the most promising move and I will continue by exploring only p1
Suppose now we reach the following situation
v1 = +50 and n1 = 500 000
v2 = +30 and n2 = 100 000
p1 still corresponds to the best move (v1 > v2) but, because n1 = 5 x n2 it is not the most promising because I could expect v2 will grow when n2 grows. In this case p2 becomes the most promising position and Damy will choose this position for its next iteration.
In other word I built a function bool f(v1, v2, n1, n2) which tell me if p1 is more promising than p2, and Damy will choose the corresponding move for the next iteration.
Note : in order to not forget an apparently very bad move the f function begins by: if (n2 < n1/100) then return false;
Tell me if this helps you to see how Damy works.

Interesting! So few questions: do you do this node selection recursively, or only at the root? And on top of all this, you also keep the entire tree in memory? There is no hash table?

TAILLE · Post by **TAILLE** » Tue Oct 11, 2016 16:00

Rein Halbersma wrote:
TAILLE wrote:Hi,

With the last posts of Fabien I have now a better understanding of the wording used for the search algorithms and it is now far more easier for me to explain how Damy works.
In order to have the best wording I propose to define my algorithm as a "Most Promising First" (MPF) search algorithm (No doubt for me it exist already in the large litterature but I did not yet find it).
To illustrate this MPF algorithm let's take as example the Scan position I analysed wth Damy with the two promising move 13-19 and 24-30.
Let's call rp the root position, p1 the position after 13-19 and p2 the position after 24-30.
After a certain number of iterations I suppose the evaluation of p1 is v1 and the subtree under p1 is made of n1 nodes. I noted here v1 = eval(p1, n1); in the alpha-beta world you may perhaps replace n1 by the depth d1 but I didn't analyse this in this context.
When the number of ierations grows n1 is also growing and v1 is expected to go nearer and nearer the real value of p1.
The idea of the MPF search is then the following:
if p1 is a winning position then I expect that v1 = eval(p1, n1) will tend to grow as n1 grows. If v1 grows rapidly clearly p1 is a very "promising" position. On the contrary if v1 has great difficulty to grow then p1 becomes less promising.
An example will clarify the point:
Suppose
v1 = +50 and n1 = 100 000
v2 = +30 and n2 = 100 000
clearly p1 is the most promising move and I will continue by exploring only p1
Suppose now we reach the following situation
v1 = +50 and n1 = 500 000
v2 = +30 and n2 = 100 000
p1 still corresponds to the best move (v1 > v2) but, because n1 = 5 x n2 it is not the most promising because I could expect v2 will grow when n2 grows. In this case p2 becomes the most promising position and Damy will choose this position for its next iteration.
In other word I built a function bool f(v1, v2, n1, n2) which tell me if p1 is more promising than p2, and Damy will choose the corresponding move for the next iteration.
Note : in order to not forget an apparently very bad move the f function begins by: if (n2 < n1/100) then return false;
Tell me if this helps you to see how Damy works.
Interesting! So few questions: do you do this node selection recursively, or only at the root? And on top of all this, you also keep the entire tree in memory? There is no hash table?

Rein,

Yes of course this selection is made recursively and that the reason why Damy can expand very deeply the most promising variation (which is different from the best variation as you understood)
As you know, because you can explore 1G positions in very few minutes or may be in less than 1' it is obvious that I am not able to store in memory all the generated positions. The choice I made is the following : in the network of the stored positions a leaf corresponds to a position on which I have ran an alpha-beta procedure to find the best evaluation at a very little depth, able to find all elementary combinations like for example the most famous combinaison after 32-28 18-23 37-32. That way I store in memory about 1/100 or 1/1000 (I don't know exactly) of all generated positions.
Keeping in memory a large number of positions doesn't mean I have no hash table but my hash table is not there to store various information on the positions; my hash table is only there to find more quickly if a position is already in the network.

Fabien Letouzey · Post by **Fabien Letouzey** » Tue Oct 11, 2016 19:20

TAILLE wrote:...
Tell me if this helps you to see how Damy works.

Gérard, thank you very much for this explanation! At last we are making progress.

At first sight I think you more or less reinvented BFS, which is quite an accomplishment if you've never seen it before. It is also very similar to what I did for the opening book. However it's implemented differently: I never count the nodes for instance. I will explain what I do in a separate post and argue why I think it computes about the same thing, namely what you call "promising" here. Maybe not today, though.

Speaking of the name, I suggest against this one. The reason is that all BFS algorithms, and there are a lot, focus on searching promising moves (or nodes) first; it's the definition of "best" in BFS. The thing is, they all have a different formula to measure how promising a move is. Instead I suggest that you select a name that represents the specific formula that you are using, such as "growing", "increasing", ... See for example in PN-search: the most "proving" node, based on "(dis)proof" numbers it uses.

That brings me to the subject of "growing". We all have that knowledge that when a score doesn't increase with time, it's likely to indicate lack of progress and therefore an ultimate draw. I, too, have learned to recognise it during tournaments. Sure enough, the score drops to 0 at some point (better luck next time ...). So it makes perfect sense, and it is interesting to see it in algorithmic form. Plus I've never seen it before, at least in the research papers of the BFS family.

However, I will argue here that the "growing" heuristic that we agree on is not what you implemented. Maybe you can confirm my understanding: you do _not_ use a past score to decide how promising moves are, only the current score and the current tree size; am I correct? In other words: no time involved, just current state. It's perfectly normal BTW, I'm just trying to clarify some things.

My point here is that what you implemented is some form of "uncertainty", which for me is *the* central theme for BFS (and totally ignored by alpha-beta at least in its basic form). For me you are saying it very clearly here, but maybe it's my interpretation: "When the number of iterations grows n1 is also growing and v1 is expected to go nearer and nearer the real value of p1.". This is a very general statistical property. In fact I would expect most values to have "uncertainty" (say a confidence interval) proportional to 1 / sqrt(n) but perhaps it doesn't apply to trees; Rein will hit me.

Anyway, my (double) point is:
1) you are computing a true BFS that focuses on uncertainty
2) it is not really about "progress", which seems to imply a notion of time

I suggest you have a look at some standard stuff; you might find more common ground that you expect. It doesn't have to be lengthy articles; look at this page about MCTS: http://cameronius.com/research/mcts/about/. I find it very clear and you will notice the 1 / sqrt(n_i) in the UCB formula.

Another point; I'm trying to connect your explanation with the output you described earlier:
After 1'
24-30 +150 (504 248)
13-19 +25 (64 198)
17-22 +0 (49 240)

But I learned afterwards that 13-19 is also gaining material. Did Damy miss it (at least for the first minute), or did it already saw draw lines in the subtree?

It's important because you claimed: "As you can see, between time 20" and time 1' Damy used more than 95% of it's time on the 24-30 move. Clearly it is for me the secret of the success.". I think instead it might have been accidental, speeding up the proof in the process. BFS is not forgiving regarding bad judgement compared to alpha-beta. If a move is given a low score at some point, BFS will reduce its exploration so the eventual fix can take a lot of time to occur. In other words, the "pruning" (in depth-first parlance) is very heavy.

Fabien.

TAILLE · Post by **TAILLE** » Tue Oct 11, 2016 20:23

Fabien Letouzey wrote:
TAILLE wrote:...
Tell me if this helps you to see how Damy works.
Gérard, thank you very much for this explanation! At last we are making progress.

At first sight I think you more or less reinvented BFS, which is quite an accomplishment if you've never seen it before. It is also very similar to what I did for the opening book. However it's implemented differently: I never count the nodes for instance. I will explain what I do in a separate post and argue why I think it computes about the same thing, namely what you call "promising" here. Maybe not today, though.

Speaking of the name, I suggest against this one. The reason is that all BFS algorithms, and there are a lot, focus on searching promising moves (or nodes) first; it's the definition of "best" in BFS. The thing is, they all have a different formula to measure how promising a move is. Instead I suggest that you select a name that represents the specific formula that you are using, such as "growing", "increasing", ... See for example in PN-search: the most "proving" node, based on "(dis)proof" numbers it uses.

That brings me to the subject of "growing". We all have that knowledge that when a score doesn't increase with time, it's likely to indicate lack of progress and therefore an ultimate draw. I, too, have learned to recognise it during tournaments. Sure enough, the score drops to 0 at some point (better luck next time ...). So it makes perfect sense, and it is interesting to see it in algorithmic form. Plus I've never seen it before, at least in the research papers of the BFS family.

However, I will argue here that the "growing" heuristic that we agree on is not what you implemented. Maybe you can confirm my understanding: you do _not_ use a past score to decide how promising moves are, only the current score and the current tree size; am I correct? In other words: no time involved, just current state. It's perfectly normal BTW, I'm just trying to clarify some things.

My point here is that what you implemented is some form of "uncertainty", which for me is *the* central theme for BFS (and totally ignored by alpha-beta at least in its basic form). For me you are saying it very clearly here, but maybe it's my interpretation: "When the number of iterations grows n1 is also growing and v1 is expected to go nearer and nearer the real value of p1.". This is a very general statistical property. In fact I would expect most values to have "uncertainty" (say a confidence interval) proportional to 1 / sqrt(n) but perhaps it doesn't apply to trees; Rein will hit me.

Anyway, my (double) point is:
1) you are computing a true BFS that focuses on uncertainty
2) it is not really about "progress", which seems to imply a notion of time

I suggest you have a look at some standard stuff; you might find more common ground that you expect. It doesn't have to be lengthy articles; look at this page about MCTS: http://cameronius.com/research/mcts/about/. I find it very clear and you will notice the 1 / sqrt(n_i) in the UCB formula.

Another point; I'm trying to connect your explanation with the output you described earlier:
After 1'
24-30 +150 (504 248)
13-19 +25 (64 198)
17-22 +0 (49 240)

But I learned afterwards that 13-19 is also gaining material. Did Damy miss it (at least for the first minute), or did it already saw draw lines in the subtree?

It's important because you claimed: "As you can see, between time 20" and time 1' Damy used more than 95% of it's time on the 24-30 move. Clearly it is for me the secret of the success.". I think instead it might have been accidental, speeding up the proof in the process. BFS is not forgiving regarding bad judgement compared to alpha-beta. If a move is given a low score at some point, BFS will reduce its exploration so the eventual fix can take a lot of time to occur. In other words, the "pruning" (in depth-first parlance) is very heavy.

Fabien.

Quick answer concerning the 13-19 move and the correponding material gain.

The following figures obtained after about 20" were reach exactly after the 2866th iteration
24-30 +108 (133 400)
13-19 +35 (55 320)

Concerning the 13-19 move the best value were reached very earlier after the 335th iteration
13-19 +85 (12 829)
24-30 +0 (8 104)
17-22 +0 (7958)
25-30 -100 (127)
etc.

I will answer your other questions in 1 or 2 hours because my wife is waiting for me to have dinner!

TAILLE · Post by **TAILLE** » Tue Oct 11, 2016 21:57

TAILLE wrote:
Fabien Letouzey wrote:
TAILLE wrote:...
Tell me if this helps you to see how Damy works.
Gérard, thank you very much for this explanation! At last we are making progress.

At first sight I think you more or less reinvented BFS, which is quite an accomplishment if you've never seen it before. It is also very similar to what I did for the opening book. However it's implemented differently: I never count the nodes for instance. I will explain what I do in a separate post and argue why I think it computes about the same thing, namely what you call "promising" here. Maybe not today, though.

Speaking of the name, I suggest against this one. The reason is that all BFS algorithms, and there are a lot, focus on searching promising moves (or nodes) first; it's the definition of "best" in BFS. The thing is, they all have a different formula to measure how promising a move is. Instead I suggest that you select a name that represents the specific formula that you are using, such as "growing", "increasing", ... See for example in PN-search: the most "proving" node, based on "(dis)proof" numbers it uses.

That brings me to the subject of "growing". We all have that knowledge that when a score doesn't increase with time, it's likely to indicate lack of progress and therefore an ultimate draw. I, too, have learned to recognise it during tournaments. Sure enough, the score drops to 0 at some point (better luck next time ...). So it makes perfect sense, and it is interesting to see it in algorithmic form. Plus I've never seen it before, at least in the research papers of the BFS family.

However, I will argue here that the "growing" heuristic that we agree on is not what you implemented. Maybe you can confirm my understanding: you do _not_ use a past score to decide how promising moves are, only the current score and the current tree size; am I correct? In other words: no time involved, just current state. It's perfectly normal BTW, I'm just trying to clarify some things.

My point here is that what you implemented is some form of "uncertainty", which for me is *the* central theme for BFS (and totally ignored by alpha-beta at least in its basic form). For me you are saying it very clearly here, but maybe it's my interpretation: "When the number of iterations grows n1 is also growing and v1 is expected to go nearer and nearer the real value of p1.". This is a very general statistical property. In fact I would expect most values to have "uncertainty" (say a confidence interval) proportional to 1 / sqrt(n) but perhaps it doesn't apply to trees; Rein will hit me.

Anyway, my (double) point is:
1) you are computing a true BFS that focuses on uncertainty
2) it is not really about "progress", which seems to imply a notion of time

I suggest you have a look at some standard stuff; you might find more common ground that you expect. It doesn't have to be lengthy articles; look at this page about MCTS: http://cameronius.com/research/mcts/about/. I find it very clear and you will notice the 1 / sqrt(n_i) in the UCB formula.

Another point; I'm trying to connect your explanation with the output you described earlier:
After 1'
24-30 +150 (504 248)
13-19 +25 (64 198)
17-22 +0 (49 240)

But I learned afterwards that 13-19 is also gaining material. Did Damy miss it (at least for the first minute), or did it already saw draw lines in the subtree?

It's important because you claimed: "As you can see, between time 20" and time 1' Damy used more than 95% of it's time on the 24-30 move. Clearly it is for me the secret of the success.". I think instead it might have been accidental, speeding up the proof in the process. BFS is not forgiving regarding bad judgement compared to alpha-beta. If a move is given a low score at some point, BFS will reduce its exploration so the eventual fix can take a lot of time to occur. In other words, the "pruning" (in depth-first parlance) is very heavy.

Fabien.
Quick answer concerning the 13-19 move and the correponding material gain.

The following figures obtained after about 20" were reach exactly after the 2866th iteration
24-30 +108 (133 400)
13-19 +35 (55 320)

Concerning the 13-19 move the best value were reached very earlier after the 335th iteration
13-19 +85 (12 829)
24-30 +0 (8 104)
17-22 +0 (7958)
25-30 -100 (127)
etc.

I will answer your other questions in 1 or 2 hours because my wife is waiting for me to have dinner!

Fabien,

Yes, I clearly recognise in your explanation what I tried to do : to handle uncertainty in order to decide which is the best move. OK to use the BFS wording till I find a better wording. I confirm that I use only the last score and not the past ones; your interpretation is fully correct.

Concerning the formula I used, I was aware of the formula v + c * sqrt(ln N / n) but I was not happy with it because it does not fit with my draugths knowledge. For me the uncertainty, i.e the term sqrt(ln N / n), must depend on the value v itself.
Let's take my previous example
In the situation
v1 = +50 and n1 = 500 000
v2 = +30 and n2 = 100 000
I will decide p2 the most promising move

but in the situation
v1 = +350 and n1 = 500 000
v2 = +330 and n2 = 100 000
I will continue to use p1 because with v1 = +350 I am pretty confident to find quicky the win.

In other words when v becomes high I diminsh the influence of n.
I guess it could not be solid for a mathematical point of view but at least it is in accordance to my knowledge of the game ... and it works quite well doesn'it?

Maybe this characteristic of my algorithm may justify a new name for this algorithm. In any case I will be interested if you have already seen this approach in the large BFS litterature.

BTW Fabien, did you try to find a refutation of the (very) long PV Damy produced?

Rein Halbersma · Post by **Rein Halbersma** » Tue Oct 11, 2016 22:37

Fabien Letouzey wrote: At first sight I think you more or less reinvented BFS, which is quite an accomplishment if you've never seen it before. It is also very similar to what I did for the opening book. However it's implemented differently: I never count the nodes for instance. I will explain what I do in a separate post and argue why I think it computes about the same thing, namely what you call "promising" here. Maybe not today, though.

Do you mean recursive best-first search (RBFS) here? There's a difference (in terms of memory overhead) compared to BFS at the root only. Just to clarify for others (I'm sure you know all this) one can see the non-recursive versions of DFS, BFS and breadth-first search as one big family with the while loop that you sketched earlier in this thread. The difference is that you store all nodes yet to be explored in a different data structure: a LIFO stack (DFS), a priority queue (BSF) or a FIFO queue (breadth-first search).

For a very good introduction to all these algorithms, I really like the AI book by Russell and Norvig. For RBFS, see Korf (1993) "Linear-space best-first search" (pm me for PDF). This algorithm was developed in a single player context, so it requires modifications to apply to adversarial search. There were some algorithms (SSS*) in the 1990 that did this, but in the end they were shown to expand the same tree as variations of MTD(f).

My point here is that what you implemented is some form of "uncertainty", which for me is *the* central theme for BFS (and totally ignored by alpha-beta at least in its basic form). For me you are saying it very clearly here, but maybe it's my interpretation: "When the number of iterations grows n1 is also growing and v1 is expected to go nearer and nearer the real value of p1.". This is a very general statistical property. In fact I would expect most values to have "uncertainty" (say a confidence interval) proportional to 1 / sqrt(n) but perhaps it doesn't apply to trees; Rein will hit me.

I won't hit you

but note that 1/sqrt(n) typically only applies for independent events. I think there are some old papers by Pearl from the 1980s on asymptotic properties of search trees where this is analyzed. Until then, 1/sqrt(n) should be a good rule of thumb

Rein Halbersma · Post by **Rein Halbersma** » Tue Oct 11, 2016 22:42

Fabien Letouzey wrote: By contrast, when we work on alpha-beta we think at the level of a single node; at least that's what I do. That simplifies reasoning at the price of losing the big picture, which is to improve the "quality" of the root move/score "efficiently" (for some definition of those terms). For example, who has a clear model of what ID + AB + LMR is doing to the tree? I don't, but it looks a lot like best-first search. However the implementation is scattered, like in OOP.

About LMR, I visualize this as a very narrow wavefront, with a logarithmic shape. At least in Stockfish, the depth reduction is logarithmic in depth and move order. You can show that this implies that every move will eventually be visited if nominal depth increases without bound. So LMR is a reduction that preserves correctness of the search. That's the same property of logarithms that the UCB term of sqrt(log(t)/N) has: eventually, every "bandit" will be tried again, no matter the initial success rate.

Fabien Letouzey · Post by **Fabien Letouzey** » Wed Oct 12, 2016 07:01

TAILLE wrote:Concerning the formula I used, I was aware of the formula v + c * sqrt(ln N / n) but I was not happy with it because it does not fit with my draugths knowledge. For me the uncertainty, i.e the term sqrt(ln N / n), must depend on the value v itself.

I get your point. I guess that it's more or less constant in the [-0.50, +0.50] range, but there is a "cumulative" effect when reaching say +2.00. In chess it's particularly visible, as there are fewer draws with large material imbalance (guess; like Joost I know nothing about draughts).

Let's take my previous example
In the situation
v1 = +50 and n1 = 500 000
v2 = +30 and n2 = 100 000
I will decide p2 the most promising move

but in the situation
v1 = +350 and n1 = 500 000
v2 = +330 and n2 = 100 000
I will continue to use p1 because with v1 = +350 I am pretty confident to find quicky the win.

Am I understanding correctly: you put emphasis on the absolute score (distance from 0)? I confirm that instead I would treat both examples as the same. The formula that I use to convert scores into probabilities, Softmax, is invariant by translation. This is probably OK for the opening book though, which is the domain I was targetting.

Your idea reminds me a little of EndCut in Othello, which is ProbCut applied to endgame solving. To simplify to the extreme, if score(P) > +bound, we temporarily assume a win (for this iteration) and the same for losses. We increase the bound (= confidence) after each iteration instead of depth since Othello endgames are fixed depth (at most 60 moves per game if we extend "pass"). ProbCut uses a probabilistic model instead of a fixed bound, but the idea is similar. In practice it can partially solve positions considerably faster than the full proof, which is useful for game play. Eventually, search converges to certainty anyway. It might not be at all what you have in mind though.

In other words when v becomes high I diminsh the influence of n.
I guess it could not be solid for a mathematical point of view but at least it is in accordance to my knowledge of the game ... and it works quite well doesn'it?

It makes perfect sense, and I don't think there is a known mathematical absolute for min/max trees. The reason is that modelling errors seriously is a huge pain, so we all cut corners (well except "best play for imperfect player" aka. BPIP, but they had to back-propagate full probability distributions which seems crazy). As far as I know, MCTS applies a formula that is justified only for one-ply search (one-arm bandits) and yet it is hugely successful in Go.

Contrary to tournament engines that we can more or less list sorted by ranks, I don't see a clear objective way of ranking BFS implementations. There is the vague notion that we are trying to improve the accuracy of the output as quickly as possible, but it seems hard to actually measure scientifically. In short: go for your formula and join the dozens of algorithms that are all different in their definition of "promising".

Maybe this characteristic of my algorithm may justify a new name for this algorithm. In any case I will be interested if you have already seen this approach in the large BFS litterature.

I haven't seen that idea before, but I stopped reading articles in the early 2000's. In any case, it seems clear that we are amazed at your results here; let's see what Ed (or Kingsrow) has to say.

BTW Fabien, did you try to find a refutation of the (very) long PV Damy produced?

No, and I'm not equipped for that: what would be needed is PVS(!). Refuting is exactly what it does: first follow a given PV of external source (usually the previous iteration but it doesn't actually matter much), then try to find alternative moves starting from the leaves. That's how I would implement a "blunder check" in a GUI except that they might do it in forward order for the user's convenience.

However I can try to build an independent proof from scratch this week-end by adapting my book builder, if I manage to integrate Ed's tables. I don't handle GHI but it's OK since we're looking for a win (although I would like to see the draw by myself). BTW do you handle it? Storing information "per position", without the path that leads to it seems bound to be imperfect in some cases.

BTW I never doubted your results (and therefore Scan's mistake); you clearly know what you're talking about!

Fabien.

Fabien Letouzey · Post by **Fabien Letouzey** » Wed Oct 12, 2016 07:23

Rein Halbersma wrote:Do you mean recursive best-first search (RBFS) here? There's a difference (in terms of memory overhead) compared to BFS at the root only. Just to clarify for others (I'm sure you know all this) one can see the non-recursive versions of DFS, BFS and breadth-first search as one big family with the while loop that you sketched earlier in this thread. The difference is that you store all nodes yet to be explored in a different data structure: a LIFO stack (DFS), a priority queue (BSF) or a FIFO queue (breadth-first search).

No I'm not trying to save memory, or traversal overhead. I need an explicit graph for disk storage anyway (opening book). Plus I like the elegance of pure BFS (my pseudo code). That's assuming I'm understanding correctly what RBFS is trying to accomplish, though.

BFS officially stands for breadth-first search, but since we have no use for that one (we have ID + DFS), I re-used the abbreviation.

For RBFS, see Korf (1993) "Linear-space best-first search" (pm me for PDF). This algorithm was developed in a single player context, so it requires modifications to apply to adversarial search. There were some algorithms (SSS*) in the 1990 that did this, but in the end they were shown to expand the same tree as variations of MTD(f).

Oh, I had it on my HDD. I haven't opened the "single player" directory for a while, though.

For every BFS there seems to be an equivalent "ID-like" + DFS, although as in the case of SSS* we can see that the correspondance is far from obvious. There are a lot of users of depth-first versions of A* and PN-Search (in Shogi). I believe that if the day of BFS ever comes to competition, it will be implemented as DFS. Then again, maybe LMR is already it.

Fabien Letouzey · Post by **Fabien Letouzey** » Wed Oct 12, 2016 07:31

Rein Halbersma wrote:About LMR, I visualize this as a very narrow wavefront, with a logarithmic shape. At least in Stockfish, the depth reduction is logarithmic in depth and move order. You can show that this implies that every move will eventually be visited if nominal depth increases without bound. So LMR is a reduction that preserves correctness of the search. That's the same property of logarithms that the UCB term of sqrt(log(t)/N) has: eventually, every "bandit" will be tried again, no matter the initial success rate.

The wavefront is explicit in BFS, and I guess we are looping the loop about the "root driver" you initially talked about. For experiment purposes, there is a single place that we can play with to change the shape of the tree.

I'm not sure I'm following for "log"; is there a property that wouldn't work with say "sqrt"?

As far as I know we are always aiming for "eventual correctness", like the non-recursive nature of ProbCut. In practice it is not so easy, as for instance the TT gets in the way (by mixing scores from different search depths for one thing).

Fabien Letouzey · Post by **Fabien Letouzey** » Wed Oct 12, 2016 09:00

Fabien Letouzey wrote:I will explain what I do in a separate post and argue why I think it computes about the same thing, namely what you call "promising" here. Maybe not today, though.

I'll start with my pseudo-code from earlier and specify each step:

Code: Select all

while true
  pick leaf # with smallest "path cost" (defined below)
  expand leaf # add all successors (OK for games with low BF like draughts and Othello)
  score new leaves # midgame search, about 20 plies
  back up scores # min/max

One invariant of the graph is that a node is either:
- a leaf with a score
- an internal node with all successors present in the graph
This makes it easy to save on disk for instance. It's standard for published BFS, but actual implementations often try to avoid adding all successors, at the cost of complexity and reduced information.

The "path cost" of a leaf is the sum of "move costs" along the path from the root to that leaf. "cost" corresponds to the depth-first search notion of "depth". "move cost" is local at the level of a single node. I first convert the vector of scores (backed up by min/max) to a probability distribution. I use Softmax (https://en.m.wikipedia.org/wiki/Softmax_function); you can view it as the "probability/belief that move M is actually the best move".

E.g from Scan's endgame:
24-30 +90 (83.4%)
13-19 +46 (14.3%)
17-22 +0 (2.3%)
...

Softmax has a parameter that I call "k" (1/tau for "temperature" in most articles, but I don't like this inversion). Scores are multiplied by k before applying pure Softmax. For k = 0, we always obtain the uniform distribution (scores are ignored). If we compute the scalar product of the original scores by the distribution, we obtain the average. For k -> inf, we obtain the usual MAX operator instead; the largest score will dominate. The whole point is to choose a value in-between, hence the analogy "soft max" (sub-optimal moves do participate). With k, one can favour exploration (short paths) or exploitation (large scores). IMO any BFS implementation should have one such "hyper parameter", because theory alone cannot give us an optimal value (it's domain-dependent).

The "cost" of a move M is -log(P_M). It's a common idea that we find in realisation-probability search for example in Shogi. I use base 2 but of course it doesn't matter; we only compare costs to each others.

So compared to MCTS or Gérard's solver, I don't have a notion of tree size. The reason why I don't want to use that is as follows. I want to be able to incorporate in my book trees from external sources or separate study. In Othello for instance, I mixed self-play with public games (of my engine), and sometimes let it run overnight on openings studied in a journal. Algorithms that use tree size have a problem here: they will assume that the included tree has been "properly expanded" in a "promising-first" manner. By contrast, I can include trees of arbitrary shape and quality, and the algorithm will not be confused. If they have been badly analysed (which is subjective to eval anyway), the holes will be filled because there will be "promising leaves" inside.

I don't favour large scores like Gérard does; it probably would be irrelevant for an opening book anyway. Softmax is invariant by translation and it is in fact recommended to substract the best score (+90 in my example) from all scores during the conversion to probabilities, so that all the scores are <= 0; this fixes risk of overflow in the exponential.

TAILLE · Post by **TAILLE** » Wed Oct 12, 2016 11:04

Fabien Letouzey wrote: One invariant of the graph is that a node is either:
- a leaf with a score
- an internal node with all successors present in the graph

I fully understand this invariant because I made exactly the same choice

Fabien Letouzey wrote: The "path cost" of a leaf is the sum of "move costs" along the path from the root to that leaf.

Here I have difficulties to understand because in my graph a leaf can very often be reached via different paths from the root.

World Draughts Forum

Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm

Re: Search Algorithm