I did away with the assignments to the frame return addresses (for ia32 only). Now it's much better on Athlon, and there's some improvement on Intel too:
The dead cat in me has to ask: how? What did you replace it with?
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2003-08-06 21:21: Subject: Machine code efficiency
I did away with the assignments to the frame return addresses (for ia32 only). Now it's much better on Athlon, and there's some improvement on Intel too:
PIII:
test total user mem (runs) diff Compile.................... 3.389s 2.906s 5316kb (100) (8306 lines/s) 9% Compile & Exec............. 3.262s 2.710s 3760kb (100) (221952 lines/s) 7% Ackermann.................. 1.622s 1.109s 3752kb (100) -23% Loops Nested (local)....... 1.137s 0.649s 3540kb (100) (25870792 iters/s) -36% Loops Nested (global)...... 2.059s 1.327s 3540kb (100) (12643922 iters/s) -25% Loops Recursed............. 1.486s 0.928s 3540kb (100) (1129566 iters/s) -18%
Athlon XP:
test total user mem (runs) diff Compile.................... 1.529s 1.317s 3668kb (100) (18332 lines/s) 13% Compile & Exec............. 1.474s 1.264s 3672kb (100) (475716 lines/s) 11% Ackermann.................. 0.732s 0.534s 3728kb (100) -18% Loops Nested (local)....... 0.450s 0.248s 3520kb (100) (67677360 iters/s) -42% Loops Nested (global)...... 0.780s 0.587s 3520kb (100) (28561814 iters/s) -25% Loops Recursed............. 0.571s 0.367s 3520kb (100) (2857154 iters/s) -25%
/ Martin Stjernholm, Roxen IS