Thought these results are interesting. Note that compilation with gcc-3.2 probably would generate better code, especially with arch-dependent data. The binary is compiled with arch-specific code for multiple platforms (i.e -march but for multiple platforms at once - rather nice). I'm also using multi-file Interprocedural optimizations (takes quite a long time to link Pike). arch is Athlon - Pentium probably would get even better results.
gcc version 2.95.4 20011002 (Debian prerelease):
test total user mem (runs) Pike start overhead........ 0.228s 0.001s 3352kb (22) Ackermann.................. 1.669s 1.453s 3532kb (3) Array & String Juggling.... 1.026s 0.808s 3660kb (5) Clone null-object.......... 0.488s 0.273s 3340kb (11) (12100000/s) Clone object............... 0.909s 0.692s 3340kb (6) (2602410/s) Compile.................... 1.975s 1.760s 3504kb (3) (41148 lines/s) Compile & Exec............. 1.790s 1.577s 3520kb (3) (1144313 lines/s) GC......................... 1.269s 0.925s 3468kb (4) Matrix multiplication...... 0.862s 0.643s 5144kb (6) Loops Nested (local)....... 0.578s 0.362s 3324kb (9) (416857184 iters/s) Loops Nested (global)...... 0.899s 0.642s 3324kb (6) (156877872 iters/s) Loops Recursed............. 1.442s 1.225s 3324kb (4) (3423922 iters/s)
Intel(R) C++ Compiler for 32-bit applications, Version 7.0 Build 20021021Z:
Pike start overhead........ 0.191s 0.000s 3760kb (25) Ackermann.................. 1.068s 0.864s 3956kb (5) Array & String Juggling.... 1.007s 0.804s 3968kb (5) Clone null-object.......... 0.426s 0.237s 3728kb (12) (15157895/s) Clone object............... 0.816s 0.626s 3728kb (7) (3356164/s) Compile.................... 1.594s 1.405s 3916kb (4) (68726 lines/s) Compile & Exec............. 1.667s 1.397s 3884kb (3) (1291790 lines/s) GC......................... 1.068s 0.880s 3876kb (5) Matrix multiplication...... 0.746s 0.556s 5680kb (7) Loops Nested (local)....... 0.727s 0.534s 3760kb (7) (219808464 iters/s) Loops Nested (global)...... 1.083s 0.894s 3760kb (5) (93832312 iters/s) Loops Recursed............. 0.784s 0.594s 3760kb (7) (12351014 iters/s)
Note the slowdowns in the two nested loop tests though.
Hmm. Actually, the test was between installed + dumped Pike 7.5.1 and non-installed, non-dumped with icc. Here's the results from icc when dumped:
test total user mem (runs) Pike start overhead........ 0.097s 0.001s 3560kb (25) Ackermann.................. 0.965s 0.863s 3788kb (6) Array & String Juggling.... 0.937s 0.828s 3956kb (6) Clone null-object.......... 0.347s 0.230s 3500kb (15) (19565218/s) Clone object............... 0.722s 0.624s 3480kb (7) (3363844/s) Compile.................... 1.508s 1.395s 5280kb (4) (69219 lines/s) Compile & Exec............. 1.536s 1.433s 3832kb (4) (1679302 lines/s) GC......................... 1.014s 0.898s 3688kb (5) Matrix multiplication...... 0.657s 0.560s 5568kb (8) Loops Nested (local)....... 0.647s 0.543s 3572kb (8) (247405936 iters/s) Loops Nested (global)...... 1.100s 0.902s 3560kb (5) (93000080 iters/s) Loops Recursed............. 0.695s 0.596s 3560kb (8) (14068944 iters/s)
I especially like the "pike start overhead" difference. HUGE gains over Pike compiled with gcc 2.95.4. Interestingly enough it's significantly in general over a non-dumped version, even when you think it wouldn't be.
/ David Hedbor
Previous text:
2003-01-16 02:19: Subject: Intel C++ / gcc 2.95.4 comparisions
Thought these results are interesting. Note that compilation with gcc-3.2 probably would generate better code, especially with arch-dependent data. The binary is compiled with arch-specific code for multiple platforms (i.e -march but for multiple platforms at once
- rather nice). I'm also using multi-file Interprocedural
optimizations (takes quite a long time to link Pike). arch is Athlon - Pentium probably would get even better results.
gcc version 2.95.4 20011002 (Debian prerelease):
test total user mem (runs) Pike start overhead........ 0.228s 0.001s 3352kb (22) Ackermann.................. 1.669s 1.453s 3532kb (3) Array & String Juggling.... 1.026s 0.808s 3660kb (5) Clone null-object.......... 0.488s 0.273s 3340kb (11) (12100000/s) Clone object............... 0.909s 0.692s 3340kb (6) (2602410/s) Compile.................... 1.975s 1.760s 3504kb (3) (41148 lines/s) Compile & Exec............. 1.790s 1.577s 3520kb (3) (1144313 lines/s) GC......................... 1.269s 0.925s 3468kb (4) Matrix multiplication...... 0.862s 0.643s 5144kb (6) Loops Nested (local)....... 0.578s 0.362s 3324kb (9) (416857184 iters/s) Loops Nested (global)...... 0.899s 0.642s 3324kb (6) (156877872 iters/s) Loops Recursed............. 1.442s 1.225s 3324kb (4) (3423922 iters/s)
Intel(R) C++ Compiler for 32-bit applications, Version 7.0 Build 20021021Z:
Pike start overhead........ 0.191s 0.000s 3760kb (25) Ackermann.................. 1.068s 0.864s 3956kb (5) Array & String Juggling.... 1.007s 0.804s 3968kb (5) Clone null-object.......... 0.426s 0.237s 3728kb (12) (15157895/s) Clone object............... 0.816s 0.626s 3728kb (7) (3356164/s) Compile.................... 1.594s 1.405s 3916kb (4) (68726 lines/s) Compile & Exec............. 1.667s 1.397s 3884kb (3) (1291790 lines/s) GC......................... 1.068s 0.880s 3876kb (5) Matrix multiplication...... 0.746s 0.556s 5680kb (7) Loops Nested (local)....... 0.727s 0.534s 3760kb (7) (219808464 iters/s) Loops Nested (global)...... 1.083s 0.894s 3760kb (5) (93832312 iters/s) Loops Recursed............. 0.784s 0.594s 3760kb (7) (12351014 iters/s)
Note the slowdowns in the two nested loop tests though.
/ David Hedbor
To me it just looks like it flutters a bit - the tests ought to be a little longer to get more significant digits. Note that the "total" column includes the load time in every case, so the "user" column is more interesting except for the first test.
/ Martin Stjernholm, Roxen IS
Previous text:
2003-01-16 04:16: Subject: Intel C++ / gcc 2.95.4 comparisions
Hmm. Actually, the test was between installed + dumped Pike 7.5.1 and non-installed, non-dumped with icc. Here's the results from icc when dumped:
test total user mem (runs) Pike start overhead........ 0.097s 0.001s 3560kb (25) Ackermann.................. 0.965s 0.863s 3788kb (6) Array & String Juggling.... 0.937s 0.828s 3956kb (6) Clone null-object.......... 0.347s 0.230s 3500kb (15) (19565218/s) Clone object............... 0.722s 0.624s 3480kb (7) (3363844/s) Compile.................... 1.508s 1.395s 5280kb (4) (69219 lines/s) Compile & Exec............. 1.536s 1.433s 3832kb (4) (1679302 lines/s) GC......................... 1.014s 0.898s 3688kb (5) Matrix multiplication...... 0.657s 0.560s 5568kb (8) Loops Nested (local)....... 0.647s 0.543s 3572kb (8) (247405936 iters/s) Loops Nested (global)...... 1.100s 0.902s 3560kb (5) (93000080 iters/s) Loops Recursed............. 0.695s 0.596s 3560kb (8) (14068944 iters/s)
I especially like the "pike start overhead" difference. HUGE gains over Pike compiled with gcc 2.95.4. Interestingly enough it's significantly in general over a non-dumped version, even when you think it wouldn't be.
/ David Hedbor
pike-devel@lists.lysator.liu.se