Nice results for intels compiler. How machine dependent is this result?
This was on my Athlon XP 1900+ (1G6Hz), so it seems that at least Athlon XP benefits.
This is the compilation flags used:
icc: -Ob2 -ipo -ipo_obj -axKW -O2 -g gcc: -O3 -pipe -fomit-frame-pointer -march=athlon-xp -mcpu=athlon-xp -g
"-ax<codes> Generate code specialized for processor extensions spec- ified by <codes> while also generating generic IA-32 code. <codes> includes one or more of the following characters:
i -- Pentium Pro and Pentium II processor instructions M -- MMX(TM) instructions K -- Streaming SIMD Extensions W -- Pentium(R) 4 New Instructions"
I tried to get it to run machine code, but I got obscure linking errors (compilation went fine). It seems that some objects missed eval_instruction().
/ Mirar
Previous text:
2003-02-07 14:29: Subject: Re: gcc/icc
On Fri, 7 Feb 2003, Mirar @ Pike developers forum wrote:
I downloaded icc-7.0 and ran a comparison.
test gcc icc Pike start overhead..... 0.001s 0.001s Ackermann............... 0.660s 0.646s Append array............ 0.470s (1063830/s) 0.534s (935551) Append mapping.......... 2.890s (3460/s) 2.785s (3591) Append multiset......... 0.476s (20997/s) 0.432s (23148) Array & String Juggling. 0.687s 0.604s Read binary INT16....... 0.283s (3537736/s) 0.257s (3892944) Read binary INT32....... 0.177s (2820513/s) 0.150s (3342618) Read binary INT128...... 0.895s (11173/s) 0.857s (11673) Clone null-object....... 0.228s (1313869/s) 0.177s (1691176) Clone object............ 0.429s (699153/s) 0.369s (812641) Compile................. 1.080s (22352 lines/s) 0.893s (27022) Compile & Exec.......... 0.904s (665265 lines/s) 0.738s (814537) GC...................... 0.587s 0.578s Insert in mapping....... 0.443s (1128668/s) 0.446s (1121076) Insert in multiset...... 0.880s (568182/s) 0.777s (643777) Matrix multiplication... 0.410s 0.402s Loops Nested (local).... 0.332s (50561473 iters/s) 0.487s (34473732) Loops Nested (global)... 0.528s (31788409 iters/s) 0.723s (23209587) Loops Recursed.......... 1.383s (758464 iters/s) 0.504s (2078675)
Note that icc function calls seems to be much faster (Loops Recursed), and that the speed is comparable even though the icc version is compiled without machine code. (Both are with 64-bit float and int.)
Nice results for intels compiler. How machine dependent is this result? E.g. is this pentium 4 specific, or can we expect AMD Athlon or pentium III (II) to benefit likewise? What type of CPU did you run your tests btw.?
--- Ludger
Has anyone looked into how difficult it would be to get icc to use the machine code stuff?
/ Brevbäraren