Self-learning and monte-carlo Algorithm

Sidiki · Post by **Sidiki** » Tue Dec 12, 2017 13:28

Hi all,

recently 4th december 2017, Google's firm, deepmind headquarter done a match that opposed their New chess engine called Alphazero against Stockfisch the best chess.
100 games was played and Alphazero won 25 wins 72 draws 0 losses.
Self-learning of 4 hours was enough for Alphazero to master chess game (without opening book and endgame)and crush the best chess engine.
I know that only Windames,Plus500 and Aurora Borealis are this option.
Self-learning ou teaching can be excellent to improve draughts game or solve it?
Itsn't a kind of endgame generator?

https://chess24.com/en/read/news/deepmi ... shes-chess#

Sidiki--

Fabien Letouzey · Post by **Fabien Letouzey** » Thu Dec 14, 2017 03:29

Hi Sidiki,

Sidiki wrote:recently 4th december 2017, Google's firm, deepmind headquarter done a match that opposed their New chess engine called Alphazero against Stockfisch the best chess.
100 games was played and Alphazero won 25 wins 72 draws 0 losses.
Self-learning of 4 hours was enough for Alphazero to master chess game (without opening book and endgame)and crush the best chess engine.
I know that only Windames,Plus500 and Aurora Borealis are this option.
Self-learning ou teaching can be excellent to improve draughts game or solve it?
Itsn't a kind of endgame generator?

There is so much confusion (not just from you).

First of all, the AlphaZero learning duration was 9 hours (on thousands of enhanced computers), not 4 as everybody is copy/pasting. It is claimed that AlphaZero *reached* the level of Stockfish after 4 hours, but that's not the version that played the match.

Secondly, machine learning of evaluation (as in Scan and AlphaZero, although they differ a lot) has nothing to do with the good old opening-book learning that you are mentioning. The latter only recognises positions it has seen before. So it's basically a search tree (analysis) stored on disk.

Endgame-table generators are a form of exhaustive search: they make sure every possible position (with a specific material signature) is looked at. Again, nothing to do with evaluation (or learning).

Fabien.

Luzimar · Post by **Luzimar** » Thu Dec 14, 2017 15:13

Fabien Letouzey wrote:Hi Sidiki,

Sidiki wrote:recently 4th december 2017, Google's firm, deepmind headquarter done a match that opposed their New chess engine called Alphazero against Stockfisch the best chess.
100 games was played and Alphazero won 25 wins 72 draws 0 losses.
Self-learning of 4 hours was enough for Alphazero to master chess game (without opening book and endgame)and crush the best chess engine.
I know that only Windames,Plus500 and Aurora Borealis are this option.
Self-learning ou teaching can be excellent to improve draughts game or solve it?
Itsn't a kind of endgame generator?
There is so much confusion (not just from you).

First of all, the AlphaZero learning duration was 9 hours (on thousands of enhanced computers), not 4 as everybody is copy/pasting. It is claimed that AlphaZero *reached* the level of Stockfish after 4 hours, but that's not the version that played the match.

Secondly, machine learning of evaluation (as in Scan and AlphaZero, although they differ a lot) has nothing to do with the good old opening-book learning that you are mentioning. The latter only recognises positions it has seen before. So it's basically a search tree (analysis) stored on disk.

Endgame-table generators are a form of exhaustive search: they make sure every possible position (with a specific material signature) is looked at. Again, nothing to do with evaluation (or learning).

Fabien.

Fabyen Congratulations I did not know you are a great programmer also in very good chess. Luzimar Araujo

Sidiki · Post by **Sidiki** » Thu Dec 14, 2017 21:05

Luzimar wrote:
Fabien Letouzey wrote:Hi Sidiki,

Sidiki wrote:recently 4th december 2017, Google's firm, deepmind headquarter done a match that opposed their New chess engine called Alphazero against Stockfisch the best chess.
100 games was played and Alphazero won 25 wins 72 draws 0 losses.
Self-learning of 4 hours was enough for Alphazero to master chess game (without opening book and endgame)and crush the best chess engine.
I know that only Windames,Plus500 and Aurora Borealis are this option.
Self-learning ou teaching can be excellent to improve draughts game or solve it?
Itsn't a kind of endgame generator?
There is so much confusion (not just from you).

First of all, the AlphaZero learning duration was 9 hours (on thousands of enhanced computers), not 4 as everybody is copy/pasting. It is claimed that AlphaZero *reached* the level of Stockfish after 4 hours, but that's not the version that played the match.

Secondly, machine learning of evaluation (as in Scan and AlphaZero, although they differ a lot) has nothing to do with the good old opening-book learning that you are mentioning. The latter only recognises positions it has seen before. So it's basically a search tree (analysis) stored on disk.

Endgame-table generators are a form of exhaustive search: they make sure every possible position (with a specific material signature) is looked at. Again, nothing to do with evaluation (or learning).

Fabien.
Fabyen Congratulations I did not know you are a great programmer also in very good chess. Luzimar Araujo

Hi Fabien,

Thank for all these precisions. So the truth it's else that what it's claimed on many websites and blog.

Fabien Letouzey · Post by **Fabien Letouzey** » Fri Dec 15, 2017 06:25

Sidiki wrote:Thank for all these precisions. So the truth it's else that what it's claimed on many websites and blog.

If you have doubts, you can post some links and I will have a look.

Some of the confusion is understandable. "learning" is a vague term: it's supposed to describe programs that improve with time. For example, a learning program might get better at solving combinations the more you use it. By contrast, just computing something is not "learning" in itself; you seem to be suggesting that.

The modern variant, usually called "machine learning", goes well beyond rote learning used to remember positions. It's usually used "offline", which means that the programmer runs the learning once during development. And then you get the resulting program, which doesn't learn afterwards (that would be both complicated and pointless). That's why the "learning" functionality doesn't appear; it's already been used by the author.

Sidiki · Post by **Sidiki** » Tue Dec 19, 2017 14:50

Fabien Letouzey wrote:
Sidiki wrote:Thank for all these precisions. So the truth it's else that what it's claimed on many websites and blog.
If you have doubts, you can post some links and I will have a look.

Some of the confusion is understandable. "learning" is a vague term: it's supposed to describe programs that improve with time. For example, a learning program might get better at solving combinations the more you use it. By contrast, just computing something is not "learning" in itself; you seem to be suggesting that.

The modern variant, usually called "machine learning", goes well beyond rote learning used to remember positions. It's usually used "offline", which means that the programmer runs the learning once during development. And then you get the resulting program, which doesn't learn afterwards (that would be both complicated and pointless). That's why the "learning" functionality doesn't appear; it's already been used by the author.

Hi Fabien,

I remember that in one of yours posts on Scan 2.0, you wrote that learning was in the prepare step of the program and after, ie, when the program plays, this option itsn't longer availible. So what it's the truth into this Alphazero story.
Perhaps that it has a very very huge database due to learning or has a revolutionary eval function. They said that it's Montecarlo, that is based on a deep search.

Sidiki

TAILLE · Post by **TAILLE** » Tue Dec 19, 2017 16:53

Sidiki wrote:
Fabien Letouzey wrote:
Sidiki wrote:Thank for all these precisions. So the truth it's else that what it's claimed on many websites and blog.
If you have doubts, you can post some links and I will have a look.

Some of the confusion is understandable. "learning" is a vague term: it's supposed to describe programs that improve with time. For example, a learning program might get better at solving combinations the more you use it. By contrast, just computing something is not "learning" in itself; you seem to be suggesting that.

The modern variant, usually called "machine learning", goes well beyond rote learning used to remember positions. It's usually used "offline", which means that the programmer runs the learning once during development. And then you get the resulting program, which doesn't learn afterwards (that would be both complicated and pointless). That's why the "learning" functionality doesn't appear; it's already been used by the author.
Hi Fabien,

I remember that in one of yours posts on Scan 2.0, you wrote that learning was in the prepare step of the program and after, ie, when the program plays, this option itsn't longer availible. So what it's the truth into this Alphazero story.
Perhaps that it has a very very huge database due to learning or has a revolutionary eval function. They said that it's Montecarlo, that is based on a deep search.

Sidiki

Hi Sidiki,

I do not understand your question.
What is the problem of using Montecarlo as search algorithm during a game?
Using Montecarlo does not mean your are in a learning process and BTW in the past I experimented a little this algorithm in Damy as search algorithm.

Sidiki · Post by **Sidiki** » Wed Dec 20, 2017 14:56

TAILLE wrote:
Sidiki wrote:
Fabien Letouzey wrote: If you have doubts, you can post some links and I will have a look.

Some of the confusion is understandable. "learning" is a vague term: it's supposed to describe programs that improve with time. For example, a learning program might get better at solving combinations the more you use it. By contrast, just computing something is not "learning" in itself; you seem to be suggesting that.

The modern variant, usually called "machine learning", goes well beyond rote learning used to remember positions. It's usually used "offline", which means that the programmer runs the learning once during development. And then you get the resulting program, which doesn't learn afterwards (that would be both complicated and pointless). That's why the "learning" functionality doesn't appear; it's already been used by the author.
Hi Fabien,

I remember that in one of yours posts on Scan 2.0, you wrote that learning was in the prepare step of the program and after, ie, when the program plays, this option itsn't longer availible. So what it's the truth into this Alphazero story.
Perhaps that it has a very very huge database due to learning or has a revolutionary eval function. They said that it's Montecarlo, that is based on a deep search.

Sidiki
Hi Sidiki,

I do not understand your question.
What is the problem of using Montecarlo as search algorithm during a game?
Using Montecarlo does not mean your are in a learning process and BTW in the past I experimented a little this algorithm in Damy as search algorithm.

Hi Gerard,

My question was, and i can say that it's most a hypotese that a question, perhaps that Alphazero use a very huge learning result Database.
I just precise that they said that it eval function is Montecarlo.

Maurits Meijer · Post by **Maurits Meijer** » Wed Dec 20, 2017 16:09

Sidiki wrote:
TAILLE wrote:
Sidiki wrote:
Hi Fabien,

I remember that in one of yours posts on Scan 2.0, you wrote that learning was in the prepare step of the program and after, ie, when the program plays, this option itsn't longer availible. So what it's the truth into this Alphazero story.
Perhaps that it has a very very huge database due to learning or has a revolutionary eval function. They said that it's Montecarlo, that is based on a deep search.

Sidiki
Hi Sidiki,

I do not understand your question.
What is the problem of using Montecarlo as search algorithm during a game?
Using Montecarlo does not mean your are in a learning process and BTW in the past I experimented a little this algorithm in Damy as search algorithm.
Hi Gerard,

My question was, and i can say that it's most a hypotese that a question, perhaps that Alphazero use a very huge learning result Database.
I just precise that they said that it eval function is Montecarlo.

I don't think AlphaZero's search should be called Monte Carlo; It's selecting moves in the search tree based on the advise of the evaluation function, so it's a deliberate way of pruning. This is I think the main innovation of AlphaZero, but it is hard to tell how this impacts performance.

The main power of AlphaZero, besides computational power and setting the match conditions, seems to be the massive neural net it uses for evaluation. I don't believe it is using a database in playing.

AlphaZero's publicity is absolutely fantastic.

CheckersGuy · Post by **CheckersGuy** » Tue Feb 27, 2018 15:29

Having looked at the alphaZero/alphaGoZero papers, one shouldnt call the algorithm mcts because there are no longer random playouts at leaf nodes.

World Draughts Forum

Self-learning and monte-carlo Algorithm

Self-learning and monte-carlo Algorithm

Re: Self-learning and monte-carlo Algorithm

Re: Self-learning and monte-carlo Algorithm

Re: Self-learning and monte-carlo Algorithm

Re: Self-learning and monte-carlo Algorithm

Re: Self-learning and monte-carlo Algorithm

Re: Self-learning and monte-carlo Algorithm

Re: Self-learning and monte-carlo Algorithm

Re: Self-learning and monte-carlo Algorithm

Re: Self-learning and monte-carlo Algorithm