compiler comparison

Discussion about development of draughts in the time of computer and Internet.
Post Reply
Rein Halbersma
Posts: 1722
Joined: Wed Apr 14, 2004 16:04
Contact:

compiler comparison

Post by Rein Halbersma » Sun May 30, 2010 11:02

I have been trying out new development environments. My current build system is Ubuntu Linux 10.04 64-bits, with the Eclipse/CDT environment. Using the Wubi installer from Windows XP, everything was up and running in less than an hour. Everything worked out of the box. My code compiled both under the free GNU compiler (g++ 4.4.3) and the free Intel Compiler (icpc 11.1). The speed for the Intel compiler however, greatly outperformed the GNU compiler.

Perft counts on my puny 5-year old P4 for the initial position using 64-bit Intel build were 28.8 seconds (with bulk counting), compared to 59.9 seconds for the 64-bit GNU build, and 62.2 seconds for the 32-bit MSVC++ build. An old MSVC++ 64-bit build of my perft routine on Ed's Q6600 ran in about 30 seconds, but this machine should be a lot faster (30%?). Of course, YMMV for different machines and your own programs. But I'm convinced the Intel compiler is the superior choice of free compilers, MSVC++ is a decent 2nd but the GNU C compiler is not up to the job.

@Harm: I read that your new program is developed under Linux with gcc. It would be interesting to see what speedup you can get with the Intel compiler!

Harm Jetten
Posts: 43
Joined: Thu Sep 24, 2009 18:17

Re: compiler comparison

Post by Harm Jetten » Tue Jun 01, 2010 23:12

Rein, I installed the Intel compiler just now on my Ubuntu 10.04 64bit machine (E7200 at 2.53GHz).
It appears to generate slightly slower(!) code for my move generator compared to gcc.
I used -O2 for both icpc and gcc.
The three well-known positions give these results:

The Intel icpc 11.1 timings
perft(11) 1665861398 nodes, 19.12 sec, 87139 kN/s, bulk
perft(9) 1216917193 nodes, 10.30 sec, 118190 kN/s, bulk
perft(15) 346184885 nodes, 5.56 sec, 62223 kN/s, bulk

The gcc 4.4.3 timings
perft(11) 1665861398 nodes, 18.42 sec, 90444 kN/s, bulk
perft(9) 1216917193 nodes, 9.42 sec, 129118 kN/s, bulk
perft(15) 346184885 nodes, 5.21 sec, 66472 kN/s, bulk

Rein Halbersma
Posts: 1722
Joined: Wed Apr 14, 2004 16:04
Contact:

Re: compiler comparison

Post by Rein Halbersma » Wed Jun 02, 2010 09:11

Harm Jetten wrote:Rein, I installed the Intel compiler just now on my Ubuntu 10.04 64bit machine (E7200 at 2.53GHz).
It appears to generate slightly slower(!) code for my move generator compared to gcc.
I used -O2 for both icpc and gcc.
The three well-known positions give these results:

The Intel icpc 11.1 timings
perft(11) 1665861398 nodes, 19.12 sec, 87139 kN/s, bulk
perft(9) 1216917193 nodes, 10.30 sec, 118190 kN/s, bulk
perft(15) 346184885 nodes, 5.56 sec, 62223 kN/s, bulk

The gcc 4.4.3 timings
perft(11) 1665861398 nodes, 18.42 sec, 90444 kN/s, bulk
perft(9) 1216917193 nodes, 9.42 sec, 129118 kN/s, bulk
perft(15) 346184885 nodes, 5.21 sec, 66472 kN/s, bulk
Harm,

For Intel, -O2 doesn't give me a big boost over gcc either. That's why I use full optimization -O3, combined with the most aggressive inlining level (n=2), CPU-specific optimizations and interprocedural optimizations. Another big boost (~10-20%) was due to profiling. For gcc I also use -O3, but didn't do profiling. I will post my precise command line options later today.

I have taken the concept of small and simple functions to extremes in my code. The move generator for man captures e.g. has 8 layers of functions that each are smaller than 8 lines of code. This amount of indirections makes it easy to use template tag dispatching to select the appropriate game dependent algorithms. However, it can be hard for a compiler to fully inline this to a single function call with only one level of recursion. That's why I have a macro _FORCE_INLINE_ in a lot of places. Under MSVC++ this maps to __forceinline and under Intel and gcc it becomes __attribute__((always_inline)).

Rein

Update: I use the flags

Code: Select all

icpc -O3 -ip -ipo -inline-level=2 -use-intel-optimized-headers -fno-alias -xHost -MMD -MP 
The results below were obtained on an AMD X2 5000

Code: Select all

perft(11)   1665861398 nodes,  20.80s,  80.09 Mnps 
perft( 9)   1216917193 nodes,  12.87s,  94.55 Mnps 
perft(15)    346184885 nodes,   6.11s,  56.66 Mnps 

Rein Halbersma
Posts: 1722
Joined: Wed Apr 14, 2004 16:04
Contact:

Re: compiler comparison

Post by Rein Halbersma » Sun Feb 27, 2011 15:02

Rein Halbersma wrote:I have been trying out new development environments. My current build system is Ubuntu Linux 10.04 64-bits, with the Eclipse/CDT environment. Using the Wubi installer from Windows XP, everything was up and running in less than an hour. Everything worked out of the box. My code compiled both under the free GNU compiler (g++ 4.4.3) and the free Intel Compiler (icpc 11.1). The speed for the Intel compiler however, greatly outperformed the GNU compiler.

Perft counts on my puny 5-year old P4 for the initial position using 64-bit Intel build were 28.8 seconds (with bulk counting), compared to 59.9 seconds for the 64-bit GNU build, and 62.2 seconds for the 32-bit MSVC++ build. An old MSVC++ 64-bit build of my perft routine on Ed's Q6600 ran in about 30 seconds, but this machine should be a lot faster (30%?). Of course, YMMV for different machines and your own programs. But I'm convinced the Intel compiler is the superior choice of free compilers, MSVC++ is a decent 2nd but the GNU C compiler is not up to the job.

@Harm: I read that your new program is developed under Linux with gcc. It would be interesting to see what speedup you can get with the Intel compiler!
I have now upgraded to Intel C++ Composer XE 12.0 for Linux, this includes the entire suite of Parallel Studio tools, including VTune, Cilk parallel framework and much more. Support for C++0x is a bit lacking for those using those features (I do!). For the moment I keep using VC++ as my daily build system, but it's nice to know that whenever I go multicore that I have a free platform around the corner.

gwiesenekker
Posts: 21
Joined: Sun Feb 20, 2011 21:04
Real name: Gijsbert Wiesenekker

Re: compiler comparison

Post by gwiesenekker » Sun Feb 27, 2011 15:37

For GWD the code generated by the Intel compiler is 20-25% faster than the gcc compiler (running on Fedora Core).

Gijsbert

Rein Halbersma
Posts: 1722
Joined: Wed Apr 14, 2004 16:04
Contact:

Re: compiler comparison

Post by Rein Halbersma » Sun Feb 27, 2011 18:50

gwiesenekker wrote:For GWD the code generated by the Intel compiler is 20-25% faster than the gcc compiler (running on Fedora Core).

Gijsbert
Hi Gijsbert,

Which GUI toolkit are you using on Linux? The pattern editor looks a lot like Windows to me.

Rein

gwiesenekker
Posts: 21
Joined: Sun Feb 20, 2011 21:04
Real name: Gijsbert Wiesenekker

Re: compiler comparison

Post by gwiesenekker » Sun Feb 27, 2011 20:22

GWD uses a slightly modified version of Turbo Dambase as the GUI frontend. Turbo Dambase runs on Fedora Core within a Windows virtual machine, and communicates with the Fedora Core backend by reading files from and writing files to a virtual machine shared folder that resides on a tmpfs filesystem within Fedora Core.

Gijsbert

Post Reply