I've tested on a few more architectures now and it seems that it indeed is Athlon specific. The following are measurements with machine code. The last column is the difference in user time wrt without machine code.
sparc (Ultra-80):
test total user mem (runs) diff Compile.................... 6.588s 4.096s -1kb (76) (5894 lines/s) 2% Compile & Exec............. 7.060s 4.488s -1kb (71) (134009 lines/s) 0% Ackermann.................. 5.311s 2.841s -1kb (95) -10% Loops Nested (local)....... 4.426s 1.981s -1kb (100) (8469060 iters/s) 9% Loops Nested (global)...... 4.992s 2.532s -1kb (100) (6625550 iters/s) -18% Loops Recursed............. 4.747s 2.270s -1kb (100) (461968 iters/s) -16%
Seems like local variable accesses could be improved on sparc.
ia32 (PIII 700 MHz):
test total user mem (runs) diff Compile.................... 3.462s 2.945s 5260kb (100) (8198 lines/s) 11% Compile & Exec............. 3.408s 2.760s 3628kb (100) (217922 lines/s) 9% Ackermann.................. 1.789s 1.323s 3716kb (100) -8% Loops Nested (local)....... 1.217s 0.740s 3504kb (100) (22678034 iters/s) -27% Loops Nested (global)...... 1.857s 1.359s 3504kb (100) (12342540 iters/s) -23% Loops Recursed............. 1.512s 1.031s 3504kb (100) (1017443 iters/s) -9%
ia32 (Athlon XP 1535 MHz):
test total user mem (runs) diff Compile.................... 1.464s 1.251s 5244kb (100) (19300 lines/s) 7% Compile & Exec............. 1.389s 1.197s 3656kb (100) (502423 lines/s) 6% Ackermann.................. 1.387s 1.198s 3724kb (100) 84% !! Loops Nested (local)....... 0.450s 0.261s 3508kb (100) (64182112 iters/s) -39% Loops Nested (global)...... 0.787s 0.598s 3508kb (100) (28036812 iters/s) -23% Loops Recursed............. 1.518s 1.329s 3508kb (100) (789115 iters/s) 172% !!
I also tried with a binary copied from the PIII system on my Athlon in case there's some kind of compiler difference, but that didn't change anything much. It's amazing that some cpu difference can have this dramatic effect on function call performance.
It'd be interesting to see this on more systems.
/ Martin Stjernholm, Roxen IS
Previous text:
2003-07-31 02:18: Subject: Machine code efficiency
It's on an Athlon XP. But if that matters appreciably then it's a problem in itself. I doubt it, though. Anyway, please try it yourself and see if there's a difference.
/ Martin Stjernholm, Roxen IS