Flits Dxp Server

Ed Gilbert · Post by **Ed Gilbert** » Thu Aug 26, 2010 00:24

Hi Jan-Jaap,

Nice summary of your tournament results. It seems to more or less confirm the results that I get between these programs. I like that you gave dam2.2 another 25% more time because it does not ponder during DamExchange matches. Pondering is a feature that I wish all programs had controls to turn on and off. To simply have it stuck on all the time is very intrusive, especially if you're using a single-core machine! Another problem with it, at least as it is implemented in truus and flits, is that the ponder search is displayed in the same window as the normal search, so you never really get a chance to look at (for example) the truus search results after it moves, because it immediately starts pondering which causes the search score and pv for the last move to be erased! This is the reason that the truus dxp server has a hard time adjudicating games, because it can only poll the search score until it sees that truus has moved, and by then it is too late. In kingsrow I created a separate window to display ponder searches so they don't interfere with the main search stats.

I noticed a difference with the Flits vs. Truus match played by Ed, more draws on my side.

This is explained by the fact that 1) I played the games at 5 mins per game instead of 9, and 2) my computer is 2.4GHz compared to your 3.0GHz, so your games were effectively 2.25 times longer than mine. Longer searches makes for fewer mistakes and fewer decisive games.

-- Ed

Rein Halbersma · Post by **Rein Halbersma** » Thu Aug 26, 2010 07:47

jj wrote: Maybe we can even make a benchmark out of this: see how many points a program scores on the 474 game FTD benchmark (under some standard conditions) gives a good indication of its playing strength. In addition to internet engine matches, this can help in determining more reliable and up to date estimates for program ratings.

Hi Jan-Jaap,

The chess guys already have a standard strength measurement called BayesElo http://remi.coulom.free.fr/Bayesian-Elo/
It is extremely simple to get it to work. I once ran it on a PDN with all 350,000 TurboDambase games and I got 13,000 ratings for all the players in the database. It will not only compute ratings, but also a Likelihood-of-Superiority (LOS) matrix. It is especially helpful to compare different versions of your program against each other to find whether you have made a statistically significant improvement.

Your FTD proposal could be rephrased like this: let's put Truus/Flits/Dam 2.2 as an average on a rating of 2000 (this is a parameter in BayesElo), and simply report the rating that your own program would get in a round-robin tournament against these 3 programs. Or for those who only have access to Dam 2.2, report the rating (and LOS) of your program against Dam 2.2.

What do you guys think: is this is good benchmark?

Rein

BertTuyt · Post by **BertTuyt** » Thu Aug 26, 2010 14:10

Rein/Jan-Jaap/Ed, basically I also use the same 3 programs to benchmark Damage, next to my matches against Kingsrow.

I think I also have the same observation as Jan-Jaap and Ed:

- Flits is slightly stronger then Truus.
- Flits makes sometimes completely weird moves near/in the endgame..

I also agree that it is a pity that it is not possible to switch off pondering in Truus and Flits.
Although I guess for Flits this would not be impossible to do (

)
For Truus (at least this is what i think) it is more difficult as Truus does not use a separate thread for the search and therefore user input and search are closely linked to each other.
Last but not least, I'm not sure if the commercial version of Truus is as strong as the version used to play tournaments. I suspect that the programs are the same , but that the setting of some internal (and hidden

) parameters is different. I cant prove this (yet), but it is just a thought... This could be different for the older DOS and newer Windows version.

I'm now in the midst of implementing/debugging the parallel YBWC-search (see other posts in this forum), so i might not have sufficient time for these tests on short notice.

Bert

jj · Post by jj » Thu Aug 26, 2010 19:21

Hi all,

Ed Gilbert wrote:This is explained by the fact that 1) I played the games at 5 mins per game instead of 9, and 2) my computer is 2.4GHz compared to your 3.0GHz, so your games were effectively 2.25 times longer than mine. Longer searches makes for fewer mistakes and fewer decisive games.

Yes, that makes sense. So with infinite time all games would be draws

Ed, can you tell us the score for Kingsrow vs. Dam2.2?

Rein Halbersma wrote:The chess guys already have a standard strength measurement called BayesElo http://remi.coulom.free.fr/Bayesian-Elo/

Great tool and indeed very simple to use! I collected some more games, including games by Maximus, both DamExchange and manual, played at various time schedules (5 to 20 minutes for 75 moves).
Here's my output (slightly edited), just following the example on the website.

Code: Select all

>readpgn all.pdn

695 game(s) loaded, 0 game(s) with unknown result ignored.

>elo
>>offset 2000
>>mm
>>exactdist
>>ratings

Rank Name      Elo    +    - games score oppo. draws 
   1 Flits    2058   19   19   421   63%  1983   66% 
   2 Truus    2041   19   19   391   59%  1991   73% 
   3 Maximus  1969   27   27   210   40%  2026   67% 
   4 Dam2.2   1932   21   21   368   31%  2040   58%

>los
         Fl Tr Ma Da
Flits       87 99 99
Truus    12    99 99
Maximus   0  0    95
Dam2.2    0  0  4

At this point Maximus is a single thread program without an opening book or any endgame databases, and a rather basic evaluation function.

I'm not sure if the chess ELO formulas apply to draughts just like that. It seems to me at draughts it's easier for a weaker player to draw against a stronger opponent.

As Rein suggested I used an offset of 2000. It would be nice if we could somehow link this to human ratings (then 2000 is too low), so we have "absolute" instead of relative ratings. Any ideas?

Jan-Jaap

Ed Gilbert · Post by **Ed Gilbert** » Thu Aug 26, 2010 21:09

Rein, the bayeselo program looks very useful. There are a few things about it that are not clear to me yet. Does it assume that the first player has an advantage as in chess? This does not apply in draughts. Also, how do I input match results other than giving it pgn files of games? It seems like the addplayer and addresult commands could do that, but it is not clear to me how to add a match result of say +20 -15 =123 using it. Do you know?

It seems to me at draughts it's easier for a weaker player to draw against a stronger opponent.

I agree.

Ed, can you tell us the score for Kingsrow vs. Dam2.2?

Except for a blitz match that I ran this past March to test a rewrite of my DamExchange handling, I have not run matches against Dam for quite a while. The blitz match was more than 100 wins and 0 losses at 1 min/75 moves. That was probably using 4 search threads against Dam's single thread. It brings up an interesting question about comparing parallel search engines against singles. Should you compare apples to apples or let the parallel searchers use as many threads as they can? I have an 8-core machine and if I use 8 threads against a single threaded program, in some ways that is not a fair test.

-- Ed

jj · Post by jj » Thu Aug 26, 2010 22:14

Ed Gilbert wrote:It brings up an interesting question about comparing parallel search engines against singles. Should you compare apples to apples or let the parallel searchers use as many threads as they can? I have an 8-core machine and if I use 8 threads against a single threaded program, in some ways that is not a fair test.

I would say let each program play in its strongest configuration, as long as it does not interfere with the other program when it is not its move. If a program has an advantage over another (older) program because it utilizes 64-bit architecture, uses parallel search and has larger databases, this shows the overall progress in playing strength (over the last decade). Maybe we can agree on some standard conditions like time settings (5 min? 10 min?) so everybody can use those for rating list runs if they want to. You can always run shorter matches or disable multithreading for testing purposes.

Jan-Jaap

Rein Halbersma · Post by **Rein Halbersma** » Thu Aug 26, 2010 22:59

Ed Gilbert wrote:Rein, the bayeselo program looks very useful. There are a few things about it that are not clear to me yet. Does it assume that the first player has an advantage as in chess? This does not apply in draughts. Also, how do I input match results other than giving it pgn files of games? It seems like the addplayer and addresult commands could do that, but it is not clear to me how to add a match result of say +20 -15 =123 using it. Do you know?
It seems to me at draughts it's easier for a weaker player to draw against a stronger opponent.
I agree.
Ed, can you tell us the score for Kingsrow vs. Dam2.2?
Except for a blitz match that I ran this past March to test a rewrite of my DamExchange handling, I have not run matches against Dam for quite a while. The blitz match was more than 100 wins and 0 losses at 1 min/75 moves. That was probably using 4 search threads against Dam's single thread. It brings up an interesting question about comparing parallel search engines against singles. Should you compare apples to apples or let the parallel searchers use as many threads as they can? I have an 8-core machine and if I use 8 threads against a single threaded program, in some ways that is not a fair test.

-- Ed

Hi Ed,

I don't know how to just add a results summary other than to make a mock PDN file with only result tags. You could just look at the source (it's at Remi's website).

The program does not make any black/white advantage assumptions, but it estimates this as a parameter from the data (or you can set it to zero if you want). From what I remember from my analysis on TurboDambase, there was less than 5 ELO difference with an error bar of 3 ELO. The program can also estimate a parameter for the drawing margin in ELO points. From what I remember is that indeed GM games had a larger drawing margin than lower ranked games.

For 2-player engine matches, a formula was given by Remi Coulom on talkchess to quickly compute LOS for the winning program. The funny thing was that this formula did not depend on the number of draws. So LOS for 1+ 99= 0- is the same as LOS for 1+ (although the rating difference is much larger in the latter case!).

If we could collect PDNs from all the old tournaments (or even just the last 3 years), we could estimate ratings for all current programs. They are much more sensible than the rating list that now circulates in which Truus/Flits top the lists because of old results that can discounted much too slowly. The nice thing about BayesElo is that you can also simulate the pre-tournament predictions. Makes for a nice betting pool

Rein

BertTuyt · Post by **BertTuyt** » Fri Aug 27, 2010 01:39

With infinite time all games would (most likely) result in a draw.
I tend to agree with this statement as I believe that Draughts is (from its nature) a Draw (although it will take some time before we are able to theoretically prove this).

Where i also agree is that when the amount of time increases or the computer power, that in general the numbers of mistakes/errors will reduce .

What i don't know for sure, is that every program will benefit in the same way from such an "improvement". Take the 2 extremes, a program without any evaluation function, which only has a deep search to rely on. And on the other hand a program with moderate search but
with huge know-how (and therefore will spent 99% of its time doing evaluations).

In some studies it was mentioned/observed that for every additional ply, there was a constant increase in ELO-points (with diminishing returns for very deep checkers searches). But I'm not sure if this is also tested for Draughts?

Some tests (but at really limited search-depths) which i did some years ago (when computers were slow) indicate that programs with a more complex evaluation function (initially ?) seem to benefit more from increase time and or computing power.

Bert

BertTuyt · Post by **BertTuyt** » Fri Aug 27, 2010 01:55

For me when I do comparisons (and future matches), this will always be based on the maximum capability of my program, using all the opportunities my system can deliver.

So in my case:
- 64bit OS
- Parallel search (YBWC), multicore (today 4 )
- 7p DB's

One can doubt if it is fair to use huge clusters (like 128 processors), and/or huge memory.
On the other hand a 4 GByte system with a quad-core processor is more and more a commodity.

Also to generate the 7P DB's , to convert a sequential search to a parallel search is not an easy job.

I guess that only programmers who update there programs based on new technology available, will stay competitive.

For this reason it is good to inform people that there are better alternatives available.
And for this reason testing should enable the new programs (which have invested in all the options available) to use these options !!!
But I'm not against some restrictions which we should agree upon (such as memory usage and number of cores). As this also gives a more balanced indication for people what they can expect on there home-machines.
So basically I'm not against test which restrict memory usage of a program to 4 G and only using 4 cores.

But matches based on single-cores and 640 KByte, that period is over and gone.....

Bert

chrisadam12 · Post by **chrisadam12** » Tue Sep 07, 2010 13:26

After installing Windows 2000 (Datacenter Server or Advanced Server) or Windows 2003, system properties shows that only around 3.37 GB of physical memory (RAM) is available for application and system use, although 4 GB or more RAM modules have been installed, and BIOS can correctly identifies the full installed size of physical memory, which means that the motherboard and x86 or x64 CPU processor can support more than 4 GB of physical memory.

32-bit Windows operating system depends on PAE (Physical Address Extension) feature to use more than 4 GB of physical memory. On most Windows 2000 and Windows Server 2003 system, especially those run in NUMA mode on a NUMA-capable computer. PAE is disabled by default. PAE is enabled by default only if DEP (Data Protection Execution) is enabled on a computer that supports hardware-enabled DEP, or if the computer is configured for hot-add memory devices in memory ranges beyond 4 GB.

Thus, if PAE is not enabled in Windows 2000 and Windows Server 2003 (for example, if DEP is turned off by administrator), the system may not able to detect, identify and see more than 4GB memory, and will have to allocate slightly more than 3GB of memory for system and application only as some memory address space has to be reserved and mapped for system devices and peripherals.

To enable PAE in Windows Server 2003 and Windows 2000 (and Windows XP), append the /PAE switch to the end of the line of operating system in the Boot.ini file. To disable PAE, use the /NOPAE switch. The Boot.ini normally located in the root folder (i.e. C:\) with Read-Only and Hidden attributes, which are required to be removed before the file can be edited.

______________________________________________________

Want to get-on Google's first page and loads of traffic to your website? Hire a SEO specialist from Ocean Groups seo specialist

BertTuyt · Post by **BertTuyt** » Fri Apr 22, 2011 21:19

For 2011 i have planned a Damage sabbatical, mainly focusing on a new damage GUI (Damage 2011) and a engine based on Horizon with a GUIDE wrapper so it can communicate with the Damage GUI.
As also mentioned before, my ambition is to provide a better engine compared with Flits and Truus, and completely free of costs.
It is evident that only a 158-game match will be decisive to judge if this was successful.

Within this context, i was still puzzled how to turn off pondering in Flits, as I want to play matches where the opponent is quiet while the other is searching.
I have found a simple (in execution

) method, to switch off pondering, at least it worked in the version of Flits I have !!!
Before I will explain in in more detail , I will test it with Ed (Ed if you have time), to make sure it also works with him.

If so, I will explain in more detail how it works (so the underlying architecture), and as a next step I will discuss with Ed how to make this public (as an add-on in the Flits_server, which i think we should not do), or as a separate patch-program which also tests if the right Flits version is used (as i don't know how many Flits versions are there, and this requires specific patches in Flits itself).
Maybe there is another way, even not requiring patches, but I didn't find one (so far).

The advantage of the patch:
- When you start up Flits it will behave as a normal Flits with pondering.
- You can switch of/on pondering via a new command in Flits.

So also possible to provide a patch and that the flits-server sends the new pondering off command.

Will keep you all posted ...

Bert

BertTuyt · Post by **BertTuyt** » Sat Apr 23, 2011 11:42

Herewith a start to explain the no-pondering solution for Flits.
Flits has several line-commandos which are not documented.
One of them is ml which set the maximum level for the search (computer move and pondering).
By default during startup this value is set to 20 ply.
I guess (but not sure) that this is related to bookkeeping and arrays involved (which may be only have 20 entries?).
Both the computer-to-move search as the pondering have different routines, but both use the maximum level variable.

So if you set max level to 2, then when pondering Flits will only perform a 1 ply search, and then stop.
As the Flits Thread is programmed quite well , the polling hereafter for a user-input does not consume time (or at least insignificant).
Unfortunately also the main search (computer to move) will stop after 1 -ply.

The patches I introduced bypass this dilemma.

However I think there is also another solution, when the ml command is implemented in the Flits-server, but Ed has to test that.
And it will also most likely work in different versions of Flits (as different versions have a problem with patches).

So lets assume the scenario white is another computer and Flits is Black.

When the DXP-server sends a ml 2 to Flits.
Flits will do a 1-ply search, and then stops, and waits for user input.
So basically pondering is switched off.
Before Flits receives the DXP move command, ml 20 is issued, restoring maximum level to its original state.
So Flits restart pondering, but only very short, as hereafter the move will be received, and after that Flits will start it own search.

Maybe there are flaws in this reasoning and/or approach, which might lead to the necessity of a patch, but maybe.....

Will keep you all posted...

Bert

Ed Gilbert · Post by **Ed Gilbert** » Sat Apr 23, 2011 16:41

Hi Bert,

That is an interesting idea and sounds like it might work. Usually the pondering is not a problem on a multi-core computer, but it would be nice to have the flexibility to control it. For example, dam2.2 automatically turns pondering off in dxp mode so you might like to turn it off in flits also for a match between these programs.

I'm going to be away on a business trip for a few days, but I might be able to test your idea next weekend.

-- Ed

BertTuyt · Post by **BertTuyt** » Sat Apr 23, 2011 23:29

Ed, thanks.

In a later stage (if this does not work), I can also share with you the patch I already implemented in my Flits version (and which works !).
In the mean time, can other readers (which have also a Flits program) here on/in the forum confirm that the ml 2 and ml 20 command works (another vlaue for ml is also ok) with their Flits version...

Bert

jj · Post by jj » Sun Apr 24, 2011 11:49

Hi Bert,

Nice work. In my Flits version (3.02) the 'ml' command works as you describe.
Good luck!

Jan-Jaap

World Draughts Forum

Flits Dxp Server

Re: Flits Dxp Server

Re: Flits Dxp Server

Re: Flits Dxp Server

Re: Flits Dxp Server

Re: Flits Dxp Server

Re: Flits Dxp Server

Re: Flits Dxp Server

Re: Flits Dxp Server

Re: Flits Dxp Server

Re: Flits Dxp Server

Re: Flits Dxp Server

Re: Flits Dxp Server

Re: Flits Dxp Server

Re: Flits Dxp Server

Re: Flits Dxp Server