Thought these results are interesting. Note that compilation with gcc-3.2 probably would generate better code, especially with arch-dependent data. The binary is compiled with arch-specific code for multiple platforms (i.e -march but for multiple platforms at once - rather nice). I'm also using multi-file Interprocedural optimizations (takes quite a long time to link Pike). arch is Athlon - Pentium probably would get even better results.
gcc version 2.95.4 20011002 (Debian prerelease):
test total user mem (runs) Pike start overhead........ 0.228s 0.001s 3352kb (22) Ackermann.................. 1.669s 1.453s 3532kb (3) Array & String Juggling.... 1.026s 0.808s 3660kb (5) Clone null-object.......... 0.488s 0.273s 3340kb (11) (12100000/s) Clone object............... 0.909s 0.692s 3340kb (6) (2602410/s) Compile.................... 1.975s 1.760s 3504kb (3) (41148 lines/s) Compile & Exec............. 1.790s 1.577s 3520kb (3) (1144313 lines/s) GC......................... 1.269s 0.925s 3468kb (4) Matrix multiplication...... 0.862s 0.643s 5144kb (6) Loops Nested (local)....... 0.578s 0.362s 3324kb (9) (416857184 iters/s) Loops Nested (global)...... 0.899s 0.642s 3324kb (6) (156877872 iters/s) Loops Recursed............. 1.442s 1.225s 3324kb (4) (3423922 iters/s)
Intel(R) C++ Compiler for 32-bit applications, Version 7.0 Build 20021021Z:
Pike start overhead........ 0.191s 0.000s 3760kb (25) Ackermann.................. 1.068s 0.864s 3956kb (5) Array & String Juggling.... 1.007s 0.804s 3968kb (5) Clone null-object.......... 0.426s 0.237s 3728kb (12) (15157895/s) Clone object............... 0.816s 0.626s 3728kb (7) (3356164/s) Compile.................... 1.594s 1.405s 3916kb (4) (68726 lines/s) Compile & Exec............. 1.667s 1.397s 3884kb (3) (1291790 lines/s) GC......................... 1.068s 0.880s 3876kb (5) Matrix multiplication...... 0.746s 0.556s 5680kb (7) Loops Nested (local)....... 0.727s 0.534s 3760kb (7) (219808464 iters/s) Loops Nested (global)...... 1.083s 0.894s 3760kb (5) (93832312 iters/s) Loops Recursed............. 0.784s 0.594s 3760kb (7) (12351014 iters/s)
Note the slowdowns in the two nested loop tests though.