ENGINE SPEED AND STRONG

Ed Gilbert · Post by **Ed Gilbert** » Sat Jan 14, 2017 16:17

Yes... that is a possibility, in that case I have to modify DXP a little to have it accept msec. or usec. timings (maybe your modification already does this?). Lets say that I want to run at least 1 complete game each second, in that case a move will take ~8 msec. for each side, this is still doable over a local socket.

The incremental time message has 6-character fields for both the initial time and the increment, so sub-msec resolutions are possible.

I don't have any measurements on DXP overhead, but since it is based on sockets, it should be pretty fast. In my case I think there is a lot more overhead in the kingsrow UI than in DXP messaging.

-- Ed

Joost Buijs · Post by **Joost Buijs** » Sun Jan 15, 2017 08:47

Ed Gilbert wrote:
Yes... that is a possibility, in that case I have to modify DXP a little to have it accept msec. or usec. timings (maybe your modification already does this?). Lets say that I want to run at least 1 complete game each second, in that case a move will take ~8 msec. for each side, this is still doable over a local socket.
The incremental time message has 6-character fields for both the initial time and the increment, so sub-msec resolutions are possible.

I don't have any measurements on DXP overhead, but since it is based on sockets, it should be pretty fast. In my case I think there is a lot more overhead in the kingsrow UI than in DXP messaging.

-- Ed

Latencies in a local network are usually below 1 msec., at least this is what I'm seeing here. This is fast enough to run a decent amount of games within a short period of time.

My idea is now to make a simple tourney manager (for 2 DXP engines to begin with) which handles the starting positions and all the other administrative things. It will take some time but with MFC/ATL it should not be very difficult (.NET I don't like. I'm old school).

Joost

Joost Buijs · Post by **Joost Buijs** » Mon Jan 23, 2017 19:33

Out of curiosity (because I didn't have a clue how strong Scan 2.0 is) I ran a match of 100 games between Kingsrow 1.58 and Scan 2.0. Kingsrow on 8 cores 4GHz. (unfortunately it does not support more than 8 cores) Scan on 10 cores 4GHz. Conditions were 512 MB. hash, no opening-book and no-egdb, 90 moves for the whole game in 1 minute. The results from Kingsrow's perspective are 1 win, 2 losses, 95 draws and 2 unknowns. This is what I call on par.

Now the question remains if this is almost the highest level possible and when the programs get a little bit stronger it always will be drawn or are there holes in the evaluation-functions that both programs are missing, I don't have a clue but time will tell I guess.

MichelG · Post by **MichelG** » Mon Jan 23, 2017 22:14

Yes... that is a possibility, in that case I have to modify DXP a little to have it accept msec. or usec. timings (maybe your modification already does this?). Lets say that I want to run at least 1 complete game each second, in that case a move will take ~8 msec. for each side, this is still doable over a local socket.

I've only implemented a socket server and DXP in follower mode, I can extend this of course, another option is to build a tourney manager for 2 (or more) engines with a simple GUI (like Little Blitzer for chess), this means extra work but in the end it will make life a lot easier.

Running these games ultra fast is not really needed for the ML part, but I want to have the possibility to check each modification to the engine in a short period of time, lets say within 1 hour or so.

Joost

Dragon supports 30 second games by send ".50" as time per game.

One problem you run into when playing very fast dxp games, is that the overhead of showing the interface, initializations of the engine, and communication between the engines quickly becomes the dominant time-consuming factor.

I usually let dragon self-play within the engine, running 2 versions in the same executable. With no overhead and using 12-20 separate singlethreaded processes, i have it play anywhere between 5000 and 130000 games per hour, depending on what you want to test.

Michel

Joost Buijs · Post by **Joost Buijs** » Tue Jan 24, 2017 08:21

MichelG wrote:
Yes... that is a possibility, in that case I have to modify DXP a little to have it accept msec. or usec. timings (maybe your modification already does this?). Lets say that I want to run at least 1 complete game each second, in that case a move will take ~8 msec. for each side, this is still doable over a local socket.

I've only implemented a socket server and DXP in follower mode, I can extend this of course, another option is to build a tourney manager for 2 (or more) engines with a simple GUI (like Little Blitzer for chess), this means extra work but in the end it will make life a lot easier.

Running these games ultra fast is not really needed for the ML part, but I want to have the possibility to check each modification to the engine in a short period of time, lets say within 1 hour or so.

Joost
Dragon supports 30 second games by send ".50" as time per game.

One problem you run into when playing very fast dxp games, is that the overhead of showing the interface, initializations of the engine, and communication between the engines quickly becomes the dominant time-consuming factor.

I usually let dragon self-play within the engine, running 2 versions in the same executable. With no overhead and using 12-20 separate singlethreaded processes, i have it play anywhere between 5000 and 130000 games per hour, depending on what you want to test.

Michel

Hi Michel,

Of course, running the two different engines within the same process or executable gives the lowest overhead, this is also what I do in my chess-program. The only thing that I don't like about this is that I have to keep both engine sources in the same project, I would rather like to keep these completely separated. It depends upon what I want to test, when it is evaluation only I just need 2 different evaluation functions, when I want to test the SMP-search I have to duplicate almost everything.

When testing the SMP-search (with pondering) on all cores - running them in the same process is out of the question, in that case I need two separate machines and it would be nice to have them talking to each other very fast. DXP like most other protocols is text based, a binary protocol will give less overhead. In the past (for totally different things) I often used the parallel port for communications, this runs sub-microsecond, maybe 2 cheap parallel interfaces and a parallel cable will do the job.

My goal is to run a game each second, 3600 games per hour, with my chess-program I have the experience that running the games too fast also alters the result, especially when I tune the search parameters, when tuning the evaluation-function this has less influence. Maybe this doesn't hold for draughts, chess is a combination of tactics and evaluation, draughts seems to be less dependent on search-depth and more dependent on evaluation.

Joost

Catherine · Post by **Catherine** » Wed Jan 25, 2017 13:31

Joost wrote:
My goal is to run a game each second, 3600 games per hour, with my chess-program I have the experience that running the games too fast also alters the result, especially when I tune the search parameters, when tuning the evaluation-function this has less influence. Maybe this doesn't hold for draughts, chess is a combination of tactics and evaluation, draughts seems to be less dependent on search-depth and more dependent on evaluation.

Hi,
This is an important point, draughts seem depending on search-depth, Dam 2.2 have a good level but his average search depth is around 12 or 13 and Scan's and Kingsrow's are around 20. This only difference show the strength difference.
I was trainning yersterday on center play and one question came into my mind: " when a program can do an error ?" lake of search depth or bad evaluation algorithm.
All that to ask, "How can a program can play without error?"
I saw that the 100 games played by Scan vs Kingsrow finished by the Score of 1 win, 2 losses, 95 draws and 2 unknowns.

TAILLE · Post by **TAILLE** » Wed Jan 25, 2017 16:48

Catherine wrote:Joost wrote:
My goal is to run a game each second, 3600 games per hour, with my chess-program I have the experience that running the games too fast also alters the result, especially when I tune the search parameters, when tuning the evaluation-function this has less influence. Maybe this doesn't hold for draughts, chess is a combination of tactics and evaluation, draughts seems to be less dependent on search-depth and more dependent on evaluation.

Hi,
This is an important point, draughts seem depending on search-depth, Dam 2.2 have a good level but his average search depth is around 12 or 13 and Scan's and Kingsrow's are around 20. This only difference show the strength difference.
I was trainning yersterday on center play and one question came into my mind: " when a program can do an error ?" lake of search depth or bad evaluation algorithm.
All that to ask, "How can a program can play without error?"
I saw that the 100 games played by Scan vs Kingsrow finished by the Score of 1 win, 2 losses, 95 draws and 2 unknowns.

Hi,

You are right Catherine, average search depth and evaluation are important but I would like to mention here that they are closely related: you cannot use an agressive pruning if you do not trust your eval function. It is exactly the same for humans: the GMI have a very good eval function and as a consequence they can prune very agressively in order to analyse only the most promising sequences.
For a programmer it is trivial to put in place a very agressive pruning by extending only the best move and maybe the second best move if it is quite close to the best one but it is not trivial to build a good eval function is it?

Joost Buijs · Post by **Joost Buijs** » Thu Jan 26, 2017 08:02

Catherine wrote:Joost wrote:
My goal is to run a game each second, 3600 games per hour, with my chess-program I have the experience that running the games too fast also alters the result, especially when I tune the search parameters, when tuning the evaluation-function this has less influence. Maybe this doesn't hold for draughts, chess is a combination of tactics and evaluation, draughts seems to be less dependent on search-depth and more dependent on evaluation.

Hi,
This is an important point, draughts seem depending on search-depth, Dam 2.2 have a good level but his average search depth is around 12 or 13 and Scan's and Kingsrow's are around 20. This only difference show the strength difference.
I was trainning yersterday on center play and one question came into my mind: " when a program can do an error ?" lake of search depth or bad evaluation algorithm.
All that to ask, "How can a program can play without error?"
I saw that the 100 games played by Scan vs Kingsrow finished by the Score of 1 win, 2 losses, 95 draws and 2 unknowns.

Hi Catherine,

It is not only search-depth and evaluation but also the number of evaluated positions per second that is important. I don't know Dam 2.2 very well but I guess it does maybe 2.5 million nodes per second during the opening phase while Kingsrow on my machine does 40 and Scan does 60 mnps. Usually higher speed means greater depth and playing-strength, but you can also use the extra speed to widen the search which will increase playing-strength as well. There exists an optimal ratio between depth and width (tree-shape) which of course depends upon the accuracy of the evaluation function.

When two different programs always draw their games it doesn't necessarily mean that they play without error, it can also mean that they don't recognize the errors made by the opponent and consequently don't take advantage of these errors, especially when the two programs have very similar evaluation-functions this might be the case.

With draughts tactics don't seem very important, at least that is what I'm seeing here, usually the PV hardly changes no matter how deep you look, I guess there has to be some minimal tactical depth to resolve direct combinations and when you get past this depth (like current top programs do) it doesn't seem to matter much anymore.

Joost

Joost Buijs · Post by **Joost Buijs** » Thu Jan 26, 2017 20:15

After fixing several small issues, improving the transposition-table update algorithm, adding somewhat smarter time-controls and disabling what I call nice-SMP, my engine now seems to play at the same level as the top dogs. The last result from Kingsrows perspective is: 1 win, 4 losses, 92 draws and 3 unknowns. I know 100 games are not enough statistically but at least it gives me some indication about playing strength.

I'm thinking about releasing executables of the engine (only if there is any interest in this), I still have to add pondering and at the moment it can only run as a socket server in DXP follower mode, it has no book, but as an option it can use ED's EGDB. I have no idea what I need to add to make it usable for other people, maybe something like hub or guide?

The evaluation function is written with BMI2 in mind (for CPU's >= Intel Haswell) using these the engine runs very fast, recently I've added BMI2 emulation routines for older CPU's, it works but it is at least 2 times as slow. In emulation mode it runs about 20 mnps during the opening phase on a fast quad-core i7, during the endgame the speed drops to half that value, this is what I see with other engines too, it is difficult to keep the processors busy when there are just a few moves available.

Joost

Catherine · Post by **Catherine** » Sat Jan 28, 2017 15:01

Joost Buijs wrote:After fixing several small issues, improving the transposition-table update algorithm, adding somewhat smarter time-controls and disabling what I call nice-SMP, my engine now seems to play at the same level as the top dogs. The last result from Kingsrows perspective is: 1 win, 4 losses, 92 draws and 3 unknowns. I know 100 games are not enough statistically but at least it gives me some indication about playing strength.

I'm thinking about releasing executables of the engine (only if there is any interest in this), I still have to add pondering and at the moment it can only run as a socket server in DXP follower mode, it has no book, but as an option it can use ED's EGDB. I have no idea what I need to add to make it usable for other people, maybe something like hub or guide?

The evaluation function is written with BMI2 in mind (for CPU's >= Intel Haswell) using these the engine runs very fast, recently I've added BMI2 emulation routines for older CPU's, it works but it is at least 2 times as slow. In emulation mode it runs about 20 mnps during the opening phase on a fast quad-core i7, during the endgame the speed drops to half that value, this is what I see with other engines too, it is difficult to keep the processors busy when there are just a few moves available.

Joost

Hi Joost,

Congratulation for this perfomance,
If think that it's a good "war" to create engines more strongest, this will give to Draughts Computer Community's engines more strongest as at Chess.
It seem the speed or number of mnps are very importants in the strenght of an engine.
So, i suppose that Argus calculate more nodes by seconde in others words, it calculate more positions.
To finish, if i understood, the best evaluation will be this one wich will take into account the maximum number of positions?

Joost Buijs · Post by **Joost Buijs** » Sat Jan 28, 2017 16:03

Catherine wrote:
Joost Buijs wrote:After fixing several small issues, improving the transposition-table update algorithm, adding somewhat smarter time-controls and disabling what I call nice-SMP, my engine now seems to play at the same level as the top dogs. The last result from Kingsrows perspective is: 1 win, 4 losses, 92 draws and 3 unknowns. I know 100 games are not enough statistically but at least it gives me some indication about playing strength.

I'm thinking about releasing executables of the engine (only if there is any interest in this), I still have to add pondering and at the moment it can only run as a socket server in DXP follower mode, it has no book, but as an option it can use ED's EGDB. I have no idea what I need to add to make it usable for other people, maybe something like hub or guide?

The evaluation function is written with BMI2 in mind (for CPU's >= Intel Haswell) using these the engine runs very fast, recently I've added BMI2 emulation routines for older CPU's, it works but it is at least 2 times as slow. In emulation mode it runs about 20 mnps during the opening phase on a fast quad-core i7, during the endgame the speed drops to half that value, this is what I see with other engines too, it is difficult to keep the processors busy when there are just a few moves available.

Joost
Hi Joost,

Congratulation for this perfomance,
If think that it's a good "war" to create engines more strongest, this will give to Draughts Computer Community's engines more strongest as at Chess.
It seem the speed or number of mnps are very importants in the strenght of an engine.
So, i suppose that Argus calculate more nodes by seconde in others words, it calculate more positions.
To finish, if i understood, the best evaluation will be this one wich will take into account the maximum number of positions?

Hi Catherine,

When you have two identical evaluation functions (with the same algorithm) the one that is programmed in a more efficient way will usually give you a higher playing strength because you can look at more position per second (assuming the search algorithm is identical).

When given enough time on my 10 core machine my current program does >120 mnps, with bullet time-controls it is somewhat less because my SMP doesn't split at low depths, with bullet I usually see 80 mnps during the opening phase and during the endgame it is more like 40 mnps. My search algorithm is not tuned very well yet and I'm confident that the playing-strength will increase when the search is better tuned. Although programmed in a different way my program uses the same 4x4 patterns as Scan and Kingsrow use, in the nearby future I want to try larger patterns with overlap because I have the feeling this will perform better.

Joost

Fabien Letouzey · Post by **Fabien Letouzey** » Sun Jan 29, 2017 09:17

Hi Catherine,

Catherine wrote:This is an important point, draughts seem depending on search-depth, Dam 2.2 have a good level but his average search depth is around 12 or 13 and Scan's and Kingsrow's are around 20. This only difference show the strength difference.

Engines don't always use comparable measures for "depth". In particular, engines from the 90's were usually based on extensions. That means when "12" is displayed it actually means "12 + extensions". By contrast, modern engines focus on pruning and "20" means more like "20 - reductions". That's why Scan also displays an "average depth" which is supposed to be more meaningful for the user and engine comparison, but even that is not clearly defined among programmers (yet).

It seem the speed or number of mnps are very importants in the strenght of an engine.

That's Joost's point of view and domain of interest. I'm not sure that's the case, and in fact my experience is that search is not critical as long as one has implemented some form of pruning (which kind does not really matter either).

Evaluation is by far the most important feature in draughts which I call a "positional game" like Othello. If that's any indication, all 4 top engines now use patterns.

Fabien.

Joost Buijs · Post by **Joost Buijs** » Sun Jan 29, 2017 10:27

Catherine wrote: It seem the speed or number of mnps are very importants in the strenght of an engine.

Fabien Letouzey wrote: That's Joost's point of view and domain of interest. I'm not sure that's the case, and in fact my experience is that search is not critical as long as one has implemented some form of pruning (which kind does not really matter either).

Hi Fabien,

It is possible that I put to much emphasis on speed, this probably is because I started with game programming a very long time ago when speed was still very important due to slow hardware. I don't think you'll disagree with the fact that an engine evaluating twice the number of positions within the same time frame (assuming everything else is the same) will always be stronger. With chess this will make a difference of ~70 Elo points, you stated somewhere that with draughts you have to divide this number by ~3 and than it still makes a difference of ~23 ELo, this looks significant enough to me.

I don't know which engines are using patterns, your engine of course, Kingsrow and I assume Dragon, however I'm not sure about Dragon. To be honest I know very little about the computer-draughts community, what engines there are and how strong they are, I guess all this information can be found here on the forum, I never took the time to browse through all these threads though.

Joost

Fabien Letouzey · Post by **Fabien Letouzey** » Sun Jan 29, 2017 11:11

Joost Buijs wrote:It is possible that I put to much emphasis on speed, this probably is because I started with game programming a very long time ago when speed was still very important due to slow hardware. I don't think you'll disagree with the fact that an engine evaluating twice the number of positions within the same time frame (assuming everything else is the same) will always be stronger. With chess this will make a difference of ~70 Elo points, you stated somewhere that with draughts you have to divide this number by ~3 and than it still makes a difference of ~23 ELo, this looks significant enough to me.

We do the things that we like.

The ~3 is only a theory; I don't have any figure for speed doubling in draughts (which even goes down with increasing time I guess). But for me even 30 Elo is a small number: that's what I gained from game phase in eval for instance, which is a tiny piece. By contrast, full eval vs. material + PST was 300 at fast TC.

My reasoning is more about Elo per unit of effort. IMO improving speed is not only more work than anything else (except maybe patterns for someone who doesn't know them) but it also affects what you can do everywhere, lest you negate that speed gain.

I don't know which engines are using patterns, your engine of course, Kingsrow and I assume Dragon, however I'm not sure about Dragon. To be honest I know very little about the computer-draughts community, what engines there are and how strong they are, I guess all this information can be found here on the forum, I never took the time to browse through all these threads though.

Yours

IIUC, Dragon has a full eval + 6x6 patterns (that's a lot of work!). I think Michel was seeing patterns as a "knowledge-hole filler" rather than a replacement for classical evaluation. I'm also guessing that Ed has kept features from his old eval such as outposts.

BertTuyt · Post by **BertTuyt** » Sun Jan 29, 2017 12:22

Interesting discussion.
So herewith my 5 cents..

We lack so far a theoretical model for the strength of a draughts program in relation to the depth and strength of the evaluation.
Within this context, we also don't have a metric how to quantify the evaluation function of a program.

What we assume, and correct me when I'm wrong.
1) The game in itself is always a draw when 2 program play perfectly.
2) Rating (for now in line with chess, lets call it ELO), cannot grow to infinite, so I guess there is a maximum rating E.max
3) When a program reaches infinite depth it will play perfectly (even without evaluation knowledge), so for d --> infinite, E --> E.max
4) When a program has perfect knowledge it will play perfectly (even without searching), so for e ---> infinite , E --> E.max

As Joost pointed out, a more effective implementation of a program with a given evolution will play better.
On the other hand there might be a break even point if one adds more knowledge which reduced search depth, than the program overall might become weaker.

I don't know if the increase in performance (measured in ELO) is linear, and that the value (as we tested) is independent on evaluation implementation.
What we have seen in some tests, that definitely there seems to be some diminishing returns with increasing search depth.
It would be interesting to test this with the new programs like Argus, who are able to reach search depth into a domain we have not tested before.

We could post all kinds of functions , like E = E.max ( 1 - C*exp(-d/d0)*exp(-e/e0)), but as of now I have no clue.

Also still in the twilight zone, the maximum delta in ELO, compared with programs like Argus, Kingsrow, Scan, the depth when evaluation starts to become non-relevant (so the search sees the effects of locks, can calculate the final outcome of good/bad outposts), and to what extend (based upon an unknown metric), we are able to increase the evaluation strength with (for example) a factor of 2...

Bert

World Draughts Forum

ENGINE SPEED AND STRONG

Re: ENGINE SPEED AND STRONG

Re: ENGINE SPEED AND STRONG

Re: ENGINE SPEED AND STRONG

Re: ENGINE SPEED AND STRONG

Re: ENGINE SPEED AND STRONG

Re: ENGINE SPEED AND STRONG

Re: ENGINE SPEED AND STRONG

Re: ENGINE SPEED AND STRONG

Re: ENGINE SPEED AND STRONG

Re: ENGINE SPEED AND STRONG

Re: ENGINE SPEED AND STRONG

Re: ENGINE SPEED AND STRONG

Re: ENGINE SPEED AND STRONG

Re: ENGINE SPEED AND STRONG

Re: ENGINE SPEED AND STRONG