Now with corresponding optimizations (-O3 -ipp7). Run on my Athlon XP. (Not a Pentium 4!)
test gcc (machine code) icc (no machine code) Pike start overhead..... 0.001s 0.001s Ackermann............... 0.660s 0.653s Append array............ 0.470s (1063830/s) 0.490s (1020408) Append mapping.......... 2.890s (3460/s) 2.670s (3745) Append multiset......... 0.476s (20997/s) 0.416s (24017) Array & String Juggling. 0.687s 0.591s Read binary INT16....... 0.283s (3537736/s) 0.247s (4047619) Read binary INT32....... 0.177s (2820513/s) 0.140s (3571429) Read binary INT128...... 0.895s (11173/s) 0.767s (13043) Clone null-object....... 0.228s (1313869/s) 0.188s (1598985) Clone object............ 0.429s (699153/s) 0.367s (816327) Compile................. 1.080s (22352 lines/s) 0.906s (26645) Compile & Exec.......... 0.904s (665265 lines/s) 0.689s (873402) GC...................... 0.587s 0.579s Insert in mapping....... 0.443s (1128668/s) 0.457s (1094092) Insert in multiset...... 0.880s (568182/s) 0.770s (649351) Matrix multiplication... 0.410s 0.386s Loops Nested (local).... 0.332s (50561473 iters/s) 0.457s (36711632) Loops Nested (global)... 0.528s (31788409 iters/s) 0.694s (24164714) Loops Recursed.......... 1.383s (758464 iters/s) 0.475s (2207528)
It's clearly impressive. (Or, gcc lost any impressiveness it had.)
/ Mirar
Previous text:
2003-02-07 14:29: Subject: Re: gcc/icc
On Fri, 7 Feb 2003, Mirar @ Pike developers forum wrote:
I downloaded icc-7.0 and ran a comparison.
test gcc icc Pike start overhead..... 0.001s 0.001s Ackermann............... 0.660s 0.646s Append array............ 0.470s (1063830/s) 0.534s (935551) Append mapping.......... 2.890s (3460/s) 2.785s (3591) Append multiset......... 0.476s (20997/s) 0.432s (23148) Array & String Juggling. 0.687s 0.604s Read binary INT16....... 0.283s (3537736/s) 0.257s (3892944) Read binary INT32....... 0.177s (2820513/s) 0.150s (3342618) Read binary INT128...... 0.895s (11173/s) 0.857s (11673) Clone null-object....... 0.228s (1313869/s) 0.177s (1691176) Clone object............ 0.429s (699153/s) 0.369s (812641) Compile................. 1.080s (22352 lines/s) 0.893s (27022) Compile & Exec.......... 0.904s (665265 lines/s) 0.738s (814537) GC...................... 0.587s 0.578s Insert in mapping....... 0.443s (1128668/s) 0.446s (1121076) Insert in multiset...... 0.880s (568182/s) 0.777s (643777) Matrix multiplication... 0.410s 0.402s Loops Nested (local).... 0.332s (50561473 iters/s) 0.487s (34473732) Loops Nested (global)... 0.528s (31788409 iters/s) 0.723s (23209587) Loops Recursed.......... 1.383s (758464 iters/s) 0.504s (2078675)
Note that icc function calls seems to be much faster (Loops Recursed), and that the speed is comparable even though the icc version is compiled without machine code. (Both are with 64-bit float and int.)
Nice results for intels compiler. How machine dependent is this result? E.g. is this pentium 4 specific, or can we expect AMD Athlon or pentium III (II) to benefit likewise? What type of CPU did you run your tests btw.?
--- Ludger
Has anyone looked into how difficult it would be to get icc to use the machine code stuff?
/ Brevbäraren