Internet engine matches

MichelG · Post by **MichelG** » Thu Nov 29, 2012 13:01

BertTuyt wrote: In Computer Chess several experiments indicate an ELO gain of 50 - 70 points for every doubling of speed.
Maybe the game of Draughts is different and/or we already are in the Diminishing region, remains an open question, for now.
At least in this case the gain is significant smaller, 60 ELO-points for a 10-fold speed increase.
If one assumes that for the ELO improvement the next formula can be applied (and for 1 Min/Game for both ELO difference is 0):

...

Although statistics can be approved this seems to be an interesting result, as I expect that for longer search (such as 10 Min/Game - 100 Min/Game) we might see more diminishing returns.

I will do the test ( 1 Min/Game for Damage, and 10 Min/Game for KingsRow) to see if the Constant ( here 56), will be reproduced...
I'm very interested if (based on self play or other experiences) similar results have been observed int he past.

Bert

Very interesting results. 60 elo points for a factor 10 equals about 18 points for each doubling of speed.

That proves it takes a lot of effort to increase playing strength by making the program search faster. The last couple of weeks i put in a lot of effort to make dragon about 20% faster. That would translate to only 5 elo points improvement.

100v10 would be interesting to compare to, but i quess testing that would take a lot of time.

BertTuyt · Post by **BertTuyt** » Thu Nov 29, 2012 20:18

Michael, I was also a little surprised as the ELO gain is "relatively" small.
But so far it has not been confirmed by other programs.
It could be that this value is only valid for the Damage - Kingsrow "system", around this search depth for both, and that for other combinations one would see other values.
I will definitely do a 100 Min/Game - 10 Min/Game match, but need to think how I want to organize that.

If for this case the gain is even smaller, what I basically expect, than relying on future hardware improvement is absolutely not the way forward.
As the frequency these days seems to top around 3.5 - 4.0 GHz, further search speed improvement is only possible trough parallel search.
As the improvement does not scale linear with the number of cores, i expect that you need (maybe) 16 cores to improve the speed compared with a 1-core search with a factor of 10.
Maybe there is much to improve within the search itself , such as several options for singular/forced moves.

But the most likely scenario is to work on the evaluation, which is the hard work.
Maybe we were just lucky that so far the search improvement was sufficient, but it could be , that we enter the era where we really need to focus on the evaluation function more.

Bert

Walter Thoen · Post by **Walter Thoen** » Sun Dec 02, 2012 11:50

Thanks for the responses to my question about using multiple algorithms in parallel. Very useful.

I am not very surprised that Bert finds only a small ELO gain for an increase in search time. I think that most modern programs on modern hardware with a large egdb have reached the point where they should always be able to find a draw. The advantage that you can achieve by a better search or better evaluation is simply not enough to break out of the draw margin. I really hope that soon most programs will have the option to use the Killer-rule in the move generator. With Killer-mode activated it should be easier to determine if improvements in search or evaluation work. The improvements that do work in Killer-mode are probably also improvements for the normal game.

Walter

Rein Halbersma · Post by **Rein Halbersma** » Sun Dec 02, 2012 13:52

Walter Thoen wrote:Thanks for the responses to my question about using multiple algorithms in parallel. Very useful.

I am not very surprised that Bert finds only a small ELO gain for an increase in search time. I think that most modern programs on modern hardware with a large egdb have reached the point where they should always be able to find a draw. The advantage that you can achieve by a better search or better evaluation is simply not enough to break out of the draw margin. I really hope that soon most programs will have the option to use the Killer-rule in the move generator. With Killer-mode activated it should be easier to determine if improvements in search or evaluation work. The improvements that do work in Killer-mode are probably also improvements for the normal game.

Walter

Hi Walter,

I made this observation to Tjalling and Ed Gilbert 3 years ago. My own draughts library does support Killer draughts, as I think it's the most attractive and decisive form of the game of draughts (together with Frisian draughts). Anyway, how is your Killer Association doing? Did you manage to organize an online tournament yet?

Rein

BertTuyt · Post by **BertTuyt** » Tue Dec 04, 2012 20:22

Just another 1 Min/Game match for both, so nothing special, nor any changes since last match.
Result from the perspective of Damage.
Wins 17, Lost 16, Draw 125.

For those interested the attached match file.

Bert

MichelG · Post by **MichelG** » Thu Dec 06, 2012 13:08

I did a similar test, with dragon playing against itself on different time settings

10 minutes vs 1 minute:
    23 wins / 0 loss / 54 draws     65% score, 108 elo, 77 games

100 minutes vs 10 minute:
    16 wins / 1 loss / 41 draws     57% score, 48 elo, 58 games

With dragon playing on 1 core of a 2.8 Ghz machine.

As expected, you can see diminishing returns from the longer thinking time.

I also tried 100 vs 1 minute. In this case the player with 100 minutes searches about 5 ply deeper than the 1 minute version.

25 wins / 1 loss / 28 draws     72% score, 166 elo 54 games

Which is rather surprising. Even though dragon searches only 1% of number of positions, it still manages to draw most of the games. And i am guessing that it will still draw many times, even playing against itself at 1000 or more times the thinking time.

Could it be that at 1 minute/game at one 1 core, dragon (or any other decent engine) plays optimal every move in about 50% of games?

With optimal defined as not moving into a position that is theoretically lost.

In that case, 1 minute/game would often suffice to draw against a 40-man endgame database.

Michel

Rein Halbersma · Post by **Rein Halbersma** » Thu Dec 06, 2012 13:11

MichelG wrote:I did a similar test, with dragon playing against itself on different time settings
10 minutes vs 1 minute:
    23 wins / 0 loss / 54 draws     65% score, 108 elo, 77 games

100 minutes vs 10 minute:
    16 wins / 1 loss / 41 draws     57% score, 48 elo, 58 games
With dragon playing on 1 core of a 2.8 Ghz machine.

As expected, you can see diminishing returns from the longer thinking time.

I also tried 100 vs 1 minute. In this case the player with 100 minutes searches about 5 ply deeper than the 1 minute version.
25 wins / 1 loss / 28 draws     72% score, 166 elo 54 games
Which is rather surprising. Even though dragon searches only 1% of number of positions, it still manages to draw most of the games. And i am guessing that it will still draw many times, even playing against itself at 1000 or more times the thinking time.

Could it be that at 1 minute/game at one 1 core, dragon (or any other decent engine) plays optimal every move in about 50% of games?

With optimal defined as not moving into a position that is theoretically lost.

In that case, 1 minute/game would often suffice to draw against a 40-man endgame database.

Michel

Michel, I think that this the consensus among top grand masters as well. Given reasonable time, they can draw against perfect play.
I would be very interested if you would repeat this experiment with your Killer engine (with 6 piece endgames). My prediction is that the draw percentage is a lot lower.

MichelG · Post by **MichelG** » Thu Dec 06, 2012 14:06

Rein Halbersma wrote: Michel, I think that this the consensus among top grand masters as well. Given reasonable time, they can draw against perfect play.
I would be very interested if you would repeat this experiment with your Killer engine (with 6 piece endgames). My prediction is that the draw percentage is a lot lower.

This would imply a interesting strategy of a computer playing a human grandmaster.

Instead of spending thinking time to think deeper, the only way of winning is playing the (non-losing) move that maximises the chance that the grandmaster falls into a trap. I guess grandmasters are already doiing this against their opponents.

For killer draughts, i remember that although the drawing rates are much lower then international, the draw percentage still gets high at higher levels.

Krzychumag · Post by **Krzychumag** » Thu Dec 06, 2012 17:01

Michel and of what version you used the program.

BertTuyt · Post by **BertTuyt** » Thu Dec 06, 2012 23:02

I did a similar test, with dragon playing against itself on different time settings

10 minutes vs 1 minute:
23 wins / 0 loss / 54 draws 65% score, 108 elo, 77 games

100 minutes vs 10 minute:
16 wins / 1 loss / 41 draws 57% score, 48 elo, 58 games

With dragon playing on 1 core of a 2.8 Ghz machine.

As expected, you can see diminishing returns from the longer thinking time.

I also tried 100 vs 1 minute. In this case the player with 100 minutes searches about 5 ply deeper than the 1 minute version.

25 wins / 1 loss / 28 draws 72% score, 166 elo 54 games

Which is rather surprising. Even though dragon searches only 1% of number of positions, it still manages to draw most of the games. And i am guessing that it will still draw many times, even playing against itself at 1000 or more times the thinking time.

Michel, thanks for sharing.
The 100 vs 1 minute is not so surprising for me, as ELO could be added for the 2 previous matches to predict the last outcome (but I could be wrong in interpreting the ELO system).
So 108 + 48 = 156 which is rather close to your measured 166.

Also interesting is that the first ELO difference is larger than what I measured ( 60 ).
But your second test reveals a lower ELO, which is a good indication (if statistics is not playing tricks with us) for diminishing returns.

I was wondering if the 1 vs 1 Min and 10 vs 1 Min self-play matches could also be sued as a measure for program strength.
So the lower the difference the stronger the base program is.
But i could also be wrong here, as a random playing program will yield no improvement at all, which is not a measure for perfect play.

In the mean time I'm doing a test with Flits ( so 1 Min - 1 Min, 10 Min Damage - 1 Min Flits, and 10 Min Flits - 1 Min Damage) to verify if the Damage-Flits system behaves in a similar way compared with Damage - Kingsrow.

Bert

MichelG · Post by **MichelG** » Fri Dec 07, 2012 13:28

Krzychumag wrote:Michel and of what version you used the program.

It's the v4.1.2 + work in progress (it's currently about 20% faster)

BertTuyt wrote:
Michel, thanks for sharing.
The 100 vs 1 minute is not so surprising for me, as ELO could be added for the 2 previous matches to predict the last outcome (but I could be wrong in interpreting the ELO system).
So 108 + 48 = 156 which is rather close to your measured 166.

Also interesting is that the first ELO difference is larger than what I measured ( 60 ).
But your second test reveals a lower ELO, which is a good indication (if statistics is not playing tricks with us) for diminishing returns.

I was wondering if the 1 vs 1 Min and 10 vs 1 Min self-play matches could also be sued as a measure for program strength.
So the lower the difference the stronger the base program is.
But i could also be wrong here, as a random playing program will yield no improvement at all, which is not a measure for perfect play.
Bert

Ofcourse you could then write a program that avoids winning at all cost, and it would score very high on that measure

But in practise i still higher draw levels can be a indication of strong play.

Maybe we should define a new standard for engine-engine competitions; all games should be played at 30 seconds per game (about 0.5 seconds a move)

This would increase the number of non-draws significantly and keep the game interesting for the programmers. (At least for the next x years) And playing strong at 0.5 sec/move is relevant for people playing the engine, so that they don't have to wait for the computer to move.

Michel

BertTuyt · Post by **BertTuyt** » Fri Dec 07, 2012 23:20

I thought if it would be possible to quantify the statement of Michel.

Could it be that at 1 minute/game at one 1 core, dragon (or any other decent engine) plays optimal every move in about 50% of games?
With optimal defined as not moving into a position that is theoretically lost.

For this purpose I tried to generate some sort of model, for which i have run a quick and dirty (so it might be flawed) simulation.

Basic assumptions are (wright or wrong, but at least it is a starting point):
* In the first X-moves of the game both players will use a (well-proven) book, and it is assumed that this book will not contain non-optimal moves (according to the definition of Michel, so no moves moving into a theoretical lost position).
* From move Y onwards , the search will mostly/always reach Data-Base nodes, so play from here does not yield errors.
* In the current simulation model the probability to make an error is equal for both players (but this can be easily changed), ans the probability does not change between move X (book) and move Y( endgame end-nodes) (think this is also not true, as i expect that the error-rate will peak somewhere in the middle game).
* An error will result (at that point of the game) in a lost position if the previous position was a draw, and a draw position if the previous position was a win (from the perspective of the player making the last error). I did not include the error which transfers a won game into a lost one.

In the calculation i assumed that X=10 and Y=40. So basically there are (only) 30 moves possible for each player to make an error.
Output for the error probability from 0 o/oo to 10 o/oo (pro-mille) is summarized in next table (actual output simulation).
For better statistics i run the 30-move simulation 1.000.000 times (so basically a 1.000.000 games match).

Code: Select all

0. Win = 0 (  0.0), Draw = 1000000 (100.0), Lost = 0 (  0.0)
1. Win = 14775 (  1.5), Draw = 970477 ( 97.0), Lost = 14748 (  1.5)
2. Win = 28878 (  2.9), Draw = 942417 ( 94.2), Lost = 28705 (  2.9)
3. Win = 42489 (  4.2), Draw = 914882 ( 91.5), Lost = 42629 (  4.3)
4. Win = 55453 (  5.5), Draw = 889193 ( 88.9), Lost = 55354 (  5.5)
5. Win = 67959 (  6.8), Draw = 864236 ( 86.4), Lost = 67805 (  6.8)
6. Win = 79783 (  8.0), Draw = 840790 ( 84.1), Lost = 79427 (  7.9)
7. Win = 90141 (  9.0), Draw = 818342 ( 81.8), Lost = 91517 (  9.2)
8. Win = 101686 ( 10.2), Draw = 796631 ( 79.7), Lost = 101683 ( 10.2)
9. Win = 111920 ( 11.2), Draw = 776201 ( 77.6), Lost = 111879 ( 11.2)
10. Win = 122363 ( 12.2), Draw = 755859 ( 75.6), Lost = 121778 ( 12.2)

If we would take the 3 matches Damage against Kingsrow at 1 Min - 1Min, we found the result 41 Win Damage (8.6%), 43 Win Kingsrow (9.0%) and 390 Draws (82.3%).
So if we assume that both players have the same ELO_rating (and also the same error-probability) then according to the table the error-rate is around 6-7 o/oo.
So this would indicate that both damage as Kingsrow play optimal moves in 99.993% - 99.994% ....

Actual program code below, so others can also experiment with it. I appreciate feedback, and maybe correct some errors..
I added the loop for probability later so therefore the initial initialization of variables has no function anymore...

Code: Select all

// DSimulation.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include "stdlib.h"

#define XGAME			1000000
#define XMOVE			30

#define GAMESTATUS_WIN	1
#define GAMESTATUS_DRAW	0
#define GAMESTATUS_LOST	-1

#define PROBABILITY_RANGE 	1000

int _tmain(int argc, _TCHAR* argv[])
{
	int iGame, iMove ;
	int iGameStatus ;
	int iPErrorMove0, iPErrorMove1, iRandomMove ;
	int iXWin, iXDraw, iXLost ;
	float fWin, fDraw, fLost ;

	iXWin	= 0 ;						// Init WDL Counters
	iXDraw	= 0 ;
	iXLost	= 0 ;

	iPErrorMove0 = 50 ;					// Error probability in %
	iPErrorMove1 = 50 ;


	for ( int j=0; j<=100; j++ ) {

	iPErrorMove0 = j * 1 ;					// Error probability in o/oo (pro-mille)
	iPErrorMove1 = j * 1 ;

	iXWin	= 0 ;							// Init WDL Counters
	iXDraw	= 0 ;
	iXLost	= 0 ;

	for ( iGame = 0; iGame < XGAME; iGame++ ) {

		iGameStatus = GAMESTATUS_DRAW ;		// Perspective Player0

		for ( iMove = 0; iMove < XMOVE; iMove++ ) {

			iRandomMove = ( rand() % PROBABILITY_RANGE ) + 1 ;

			if ( ( iGame + iMove ) % 2 ) {	// Player 1

				if ( iRandomMove <= iPErrorMove1 ) {

					if ( iGameStatus == GAMESTATUS_DRAW )
						iGameStatus = GAMESTATUS_WIN ;
					else if ( iGameStatus == GAMESTATUS_LOST )
						iGameStatus = GAMESTATUS_DRAW ;
								// No need to address GAMESTATUS_WIN
				}

			} else {				// Player 0

				if ( iRandomMove <= iPErrorMove0 ) {

					if ( iGameStatus == GAMESTATUS_DRAW )
						iGameStatus = GAMESTATUS_LOST ;
					else if ( iGameStatus == GAMESTATUS_WIN )
						iGameStatus = GAMESTATUS_DRAW ;
								// No need to address GAMESTATUS_LOST
				}

			}
		}

		if ( iGameStatus == GAMESTATUS_WIN )	// Update WDL Counters
			++iXWin ;
		else if ( iGameStatus == GAMESTATUS_DRAW )
			++iXDraw;
		else if ( iGameStatus == GAMESTATUS_LOST )
			++iXLost ;
	}

	fWin	= (float)( 100. * iXWin ) / XGAME ;
	fDraw	= (float)( 100. * iXDraw ) / XGAME ;
	fLost	= (float)( 100. * iXLost ) / XGAME ;

	printf("%d. Win = %d (%5.1f), Draw = %d (%5.1f), Lost = %d (%5.1f)\n", j, iXWin, fWin, iXDraw, fDraw, iXLost, fLost) ;

	}

	return 0;
}

Bert

MichelG · Post by **MichelG** » Sat Dec 08, 2012 13:00

BertTuyt wrote: ...
If we would take the 3 matches Damage against Kingsrow at 1 Min - 1Min, we found the result 41 Win Damage (8.6%), 43 Win Kingsrow (9.0%) and 390 Draws (82.3%).
So if we assume that both players have the same ELO_rating (and also the same error-probability) then according to the table the error-rate is around 6-7 o/oo.
So this would indicate that both damage as Kingsrow play optimal moves in 99.993% - 99.994% ....
...

That's the same idea i had

There are some minor flaws in the program though. XMOVE seems to indicate the total number of moves; that should be set to 60 instead of 30. The program will then give 3-3.5 o/oo error-rate.
This in turn leads to a 99.7% probability op optimal play per move. (your calculation of percentages has too many 9's in it) Note that for small error rates, you can approximate by stating winRate=loseRate=(y-x)*perMoveErrorrate

I got a observation/hypothesis on the error rate; every 10-fold increase in search-time, decreases the per move error probability by a factor 2. So errorrate=c1/(time^c2),
with c1 a constant depending on hardware and program abilities, and c2 about 0.3 - 0.33.

Or in words, the error rate scales inversely with the cube root of thinking time.

Michel

BertTuyt · Post by **BertTuyt** » Sat Dec 08, 2012 13:20

Michel, thanks for posting.
Thats the advantage of sharing code, faster feedback.
I run some small tests now, and later today I hope to provide some new thoughts.

Bert

BertTuyt · Post by **BertTuyt** » Sat Dec 08, 2012 14:19

So here the next thoughts.....

In this post I want to built on the previous ideas, and derive a formula for the ELO-difference when a perfect program/player plays an apponent which generates on a consistent/constant basis Moves with an error probability pError (between book moves and database moves)

As Michel also pointed out for small move error probabilities (pError), the mathematical approximation for win/draw/lost probabilities van be written (it is relativity straightforward to prove this)..

pWin = pError * M (with M = Y- X = MoveNumber where all searches derive theoretical DB-results, and X out of book MoveNumber)
pDraw = pError * M
pLost = 0 (as the perfect player / program will never make a mistake).

For the pMatch result the following expression is valid: pMatch = ( 1 + pError * M ) / 2.

For the ELO-difference ( dELO ) I used next formula (but as im not an ELO expert, things can go wrong form here..) : dELO = - 400 * log ( 1 / pMatch - 1 )

So including pMatch in the formula yields dELO = - 400 * log ( 1 / ( 1 + pError * M ) / 2 ) - 1 ) = -400 * log ( 2 / ( 1 + pError * M ) - 1 )

As pError * M is small one can change this into : dELO = -400 * log ( 2 * ( 1 - PError * M ) - 1 ) = - 400 log ( 1 - 2 * pError * M )

And the next simplification is based on the approximation of the log function : dELO = 800 * M * PError

For M = 30 and PError = 0.001 ( is 1 o/oo ) this yields around 24 ELO points.

Based on the Damage - Kingsrow results , both programs seems to play with an PError of 0.003 , which yields the first order approximation of 72, and the actual calculation is 69.9 .

Again this is an oversimplified approach/model, but it might hint that we have not so much ELO points to gain to reach (asymptotic) perfect play.
If one would include the relation pError and speed (as suggested by Michel), we even can prove the magnitude of diminishing returns...

Bert

World Draughts Forum

Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches

Re: Internet engine matches