Damage 15.3

Rein Halbersma · Post by **Rein Halbersma** » Fri Feb 21, 2020 16:32

The 4 hours of training for AlphaZero corresponds to about 1700 years of computing on a consumer-grade pc (estimate by the author of Leela-Zero, who managed to get this much resources by crowd-sourcing the computation).

Sidiki · Post by **Sidiki** » Fri Feb 21, 2020 16:40

Rein Halbersma wrote: ↑
Fri Feb 21, 2020 16:32
The 4 hours of training for AlphaZero corresponds to about 1700 years of computing on a consumer-grade pc (estimate by the author of Leela-Zero, who managed to get this much resources by crowd-sourcing the computation).

Awesome,
I understand why A0 it's so strongh. No way

Rein Halbersma · Post by **Rein Halbersma** » Fri Feb 21, 2020 16:47

Sidiki wrote: ↑
Fri Feb 21, 2020 16:40

Rein Halbersma wrote: ↑
Fri Feb 21, 2020 16:32
The 4 hours of training for AlphaZero corresponds to about 1700 years of computing on a consumer-grade pc (estimate by the author of Leela-Zero, who managed to get this much resources by crowd-sourcing the computation).
Awesome,
I understand why A0 it's so strongh. No way

Yes, Google has a lot of resources

OpenAI also made a super-human DotA bot using a farm of 100,000 computers that trained a neural network.

Sidiki · Post by **Sidiki** » Fri Feb 21, 2020 16:52

Yes, with resources we can make the impossible. So these 1700 years of computing it's as a kind of database or Just to know what variant it's better.?

Rein Halbersma · Post by **Rein Halbersma** » Fri Feb 21, 2020 16:55

Sidiki wrote: ↑
Fri Feb 21, 2020 16:52
Yes, with resources we can make the impossible. So these 1700 years of computing it's as a kind of database or Just to know what variant it's better.?

I think > 90% of the computing time was used to generate self-play games, and the rest of the resources were used to find the optimal weights for the neural network.

Krzysztof Grzelak · Post by **Krzysztof Grzelak** » Fri Feb 21, 2020 17:13

Rein Halbersma wrote: ↑
Fri Feb 21, 2020 16:32
The 4 hours of training for AlphaZero corresponds to about 1700 years of computing on a consumer-grade pc (estimate by the author of Leela-Zero, who managed to get this much resources by crowd-sourcing the computation).

Such information should be treated humorously and joyfully, and do not believe everything.

Sidiki · Post by **Sidiki** » Fri Feb 21, 2020 18:19

Rein Halbersma wrote: ↑
Fri Feb 21, 2020 16:55

Sidiki wrote: ↑
Fri Feb 21, 2020 16:52
Yes, with resources we can make the impossible. So these 1700 years of computing it's as a kind of database or Just to know what variant it's better.?
I think > 90% of the computing time was used to generate self-play games, and the rest of the resources were used to find the optimal weights for the neural network.

Nice, super computers can calculate Awesome things. I imagine if the calcul was made during 1 month.
The depth is also incredible. I play also chess, but sometimes A0 does some sacrifices of more than 25 moves.

Sidiki · Post by **Sidiki** » Fri Feb 21, 2020 18:20

Krzysztof Grzelak wrote: ↑
Fri Feb 21, 2020 17:13

Rein Halbersma wrote: ↑
Fri Feb 21, 2020 16:32
The 4 hours of training for AlphaZero corresponds to about 1700 years of computing on a consumer-grade pc (estimate by the author of Leela-Zero, who managed to get this much resources by crowd-sourcing the computation).
Such information should be treated humorously and joyfully, and do not believe everything.

Rein Halbersma · Post by **Rein Halbersma** » Fri Feb 21, 2020 18:30

Krzysztof Grzelak wrote: ↑
Fri Feb 21, 2020 17:13

Rein Halbersma wrote: ↑
Fri Feb 21, 2020 16:32
The 4 hours of training for AlphaZero corresponds to about 1700 years of computing on a consumer-grade pc (estimate by the author of Leela-Zero, who managed to get this much resources by crowd-sourcing the computation).
Such information should be treated humorously and joyfully, and do not believe everything.

You do understand that Google/Deepmind used 1700 years worth of computing *in parallel* in order to achieve all that in 4 hours?

BertTuyt · Post by **BertTuyt** » Fri Feb 21, 2020 19:33

I wanted to know how the strength of the Damage engine scales with the number of learning games used to calculate the weights for the evaluation function.
For this purpose I started several 158 game DXP matches, 2 Min/Game. Further settings as mentioned in the first post.
For the training set I only used the first X (x = 10.000, 20.000, ....) games of the larger set available (1.37M).
Below the results in table and graph format.

Code: Select all

Games	W	D	L	U	T		Games	ELO
10000	79	76	0	3	158		0	195
20000	50	108	0	0	158		1	114
40000	28	128	0	2	158		2	63
80000	12	146	0	0	158		3	26
160000	5	151	0	2	158		4	11
320000	8	148	0	2	158		5	18
640000	2	156	0	0	158		6	4
1280000	4	154	0	0	158		7	9

: elo2.png (20.97 KiB) Viewed 13430 times

These results seem to indicate that for this training set , and this evaluation function saturation starts around 160K games.
Hereafter the curve seems (within statistical fluctuations) more or less flat.

To answer one of the previous questions, if one takes a state of the art ThreadRipper with 32 cores, and assuming that 1 games takes 6 second (based upon 50 ms/move), than it takes around 8 1/2 hours to generate all these 160K games.

The graphs also suggests that Damage is still 5-10 ELO weaker compared with Scan 3.1 (on my machine, with this time setting), although I expected a slightly better result based upon the Damage win in the first match (2 Win, 1Loss, 155 Draw).

For this reason I replayed the match with the evaluation based upon the full training set (1.37M games).
With 1 loss for Damage and 157 draw , and an ELO difference of 2, it more or less confirmed that there are some steps for Damage still to be taken, as Scan still seems to be the better engine.
Also interesting to see that after the initial 2 wins of Damage last weekend, Scan refuses to lose (maybe it learns in a secret way

)

Bert

Sidiki · Post by **Sidiki** » Fri Feb 21, 2020 20:19

BertTuyt wrote: ↑
Fri Feb 21, 2020 19:33
I wanted to know how the strength of the Damage engine scales with the number of learning games used to calculate the weights for the evaluation function.
For this purpose I started several 158 game DXP matches, 2 Min/Game. Further settings as mentioned in the first post.
For the training set I only used the first X (x = 10.000, 20.000, ....) games of the larger set available (1.37M).
Below the results in table and graph format.
Code: Select all
Games	W	D	L	U	T		Games	ELO
10000	79	76	0	3	158		0	195
20000	50	108	0	0	158		1	114
40000	28	128	0	2	158		2	63
80000	12	146	0	0	158		3	26
160000	5	151	0	2	158		4	11
320000	8	148	0	2	158		5	18
640000	2	156	0	0	158		6	4
1280000	4	154	0	0	158		7	9
elo2.png

These results seem to indicate that for this training set , and this evaluation function saturation starts around 160K games.
Hereafter the curve seems (within statistical fluctuations) more or less flat.

To answer one of the previous questions, if one takes a state of the art ThreadRipper with 32 cores, and assuming that 1 games takes 6 second (based upon 50 ms/move), than it takes around 8 1/2 hours to generate all these 160K games.

The graphs also suggests that Damage is still 5-10 ELO weaker compared with Scan 3.1 (on my machine, with this time setting), although I expected a slightly better result based upon the Damage win in the first match (2 Win, 1Loss, 155 Draw).

For this reason I replayed the match with the evaluation based upon the full training set (1.37M games).
With 1 loss for Damage and 157 draw , and an ELO difference of 2, it more or less confirmed that there are some steps for Damage still to be taken, as Scan still seems to be the better engine.
Also interesting to see that after the initial 2 wins of Damage last weekend, Scan refuses to lose (maybe it learns in a secret way )

Bert

Your last sentence is funny, it seem that Scan refuse effectively to l'ose

He has a weight of learning, no ?
So i want to know if your training learning weights as similares to LC0's ?
If it's the case, i think that the 10.000 to 20.000 games are better.

Sidiki

BertTuyt · Post by **BertTuyt** » Fri Feb 21, 2020 20:34

Sidiki, LC0 is completely different and uses a deep neural network, nothing compared to what we do.....

Bert

Sidiki · Post by **Sidiki** » Fri Feb 21, 2020 20:47

OK, i thought that it's was the same things with the weights.

Thank

Rein Halbersma · Post by **Rein Halbersma** » Fri Feb 21, 2020 21:06

Sidiki wrote: ↑
Fri Feb 21, 2020 20:47
OK, i thought that it's was the same things with the weights.
Thank

It's both very similar and very different. Both A0 neural networks and Scan inspired patterns have their weights fitted using gradient descent optimization. But the A0 neural networks are much more complicated non-linear functions and the computation of the full eval is many orders of magnitude more expensive. This makes the self-play very slow. Pattern eval functions are very cheap to compute, and are a very effective compromise between a fully general and tunable eval versus a hand-made and hand-tuned eval. It's possible that neural networks can be even stronger than patterns (Elo-wise), but unless someone sets up an AlphaZero type of infrastructure, that's hard to prove.

One other difference is that A0 learned from scratch reinforcement learning after batches of self-play games. Between every batch of games, the eval weights are updated and a new self-play cycle starts until a better version has been found. The pattern tuning for the draughts engines has been done after all self-play games have finished and their critical positions stored. It's possible that reinforcement learning could further improve the current programs.

Sidiki · Post by **Sidiki** » Fri Feb 21, 2020 21:51

Cool Rein,
It's so the big difference between A0 and our NN patterns.
If we own the same multi computers than Google we must reach this kind of super draughts program also by self learning renforcement.!?
Thank again..
We are waiting for this evening Damage results.

Sidiki

World Draughts Forum

Damage 15.3

Re: Damage 15.3

Re: Damage 15.3

Re: Damage 15.3

Re: Damage 15.3

Re: Damage 15.3

Re: Damage 15.3

Re: Damage 15.3

Re: Damage 15.3

Re: Damage 15.3

Re: Damage 15.3

Re: Damage 15.3

Re: Damage 15.3

Re: Damage 15.3

Re: Damage 15.3

Re: Damage 15.3