I downloaded icc-7.0 and ran a comparison.
test gcc icc Pike start overhead..... 0.001s 0.001s Ackermann............... 0.660s 0.646s Append array............ 0.470s (1063830/s) 0.534s (935551) Append mapping.......... 2.890s (3460/s) 2.785s (3591) Append multiset......... 0.476s (20997/s) 0.432s (23148) Array & String Juggling. 0.687s 0.604s Read binary INT16....... 0.283s (3537736/s) 0.257s (3892944) Read binary INT32....... 0.177s (2820513/s) 0.150s (3342618) Read binary INT128...... 0.895s (11173/s) 0.857s (11673) Clone null-object....... 0.228s (1313869/s) 0.177s (1691176) Clone object............ 0.429s (699153/s) 0.369s (812641) Compile................. 1.080s (22352 lines/s) 0.893s (27022) Compile & Exec.......... 0.904s (665265 lines/s) 0.738s (814537) GC...................... 0.587s 0.578s Insert in mapping....... 0.443s (1128668/s) 0.446s (1121076) Insert in multiset...... 0.880s (568182/s) 0.777s (643777) Matrix multiplication... 0.410s 0.402s Loops Nested (local).... 0.332s (50561473 iters/s) 0.487s (34473732) Loops Nested (global)... 0.528s (31788409 iters/s) 0.723s (23209587) Loops Recursed.......... 1.383s (758464 iters/s) 0.504s (2078675)
Note that icc function calls seems to be much faster (Loops Recursed), and that the speed is comparable even though the icc version is compiled without machine code. (Both are with 64-bit float and int.)
Has anyone looked into how difficult it would be to get icc to use the machine code stuff?