Nice results on the P4. Also nice to see that the reason for the low numbers in non-recursive loops is (mostly) -asm. It would indeed be interesting to see the result for icc + asm. :)
/ David Hedbor
Previous text:
2003-02-08 00:15: Subject: Re: gcc/icc
I'm back!
I sadly enough don't have a modern P4, but my Celeron might work as an indication. However, it has 128Kb cache, and a modern P4 has 512Kb. My old 'normal' P4 has 256Kb cache, and is generally speaking somewhat slower per GHz than a modern one.
The 'gain' column is the percentage difference between gcc and icc. gcc-asm is the default compile, gcc is gcc without assembly ( optimizations.
Enough preamble, here are the tests. :-)
lain: dual 560Mhz P3; 1024Mb PC112 SDRAM
test gcc-asm gcc icc gain
Ackermann . . . . . . . . . . 1.96 2.31 1.99 14 Append array. . . . . . . . . 1.78 1.73 1.90 -10 (262697/s) Append mapping. . . . . . . . 12.44 11.65 9.24 21 (1082/s) Append multiset . . . . . . . 1.82 1.87 1.81 4 (5515/s) Array & String Juggling . . . 3.84 4.01 4.68 -16 Clone null-object . . . . . . 0.74 0.75 0.65 14 (458015/s) Clone object. . . . . . . . . 1.73 1.69 1.91 -13 (156794/s) Compile . . . . . . . . . . . 4.30 4.07 3.42 16 (7048 lines/s) Compile & Exec. . . . . . . . 4.48 4.11 3.57 14 (168696 lines/s) GC. . . . . . . . . . . . . . 1.67 1.72 1.36 22 Insert in mapping . . . . . . 0.99 1.06 0.96 10 (518672/s) Insert in multiset. . . . . . 2.71 3.09 2.41 22 (207469/s) Loops Nested (global) . . . . 1.45 2.06 1.98 5 (8473341 iters/s) Loops Nested (local). . . . . 0.90 1.48 1.31 12 (12807036 iters/s) Loops Recursed. . . . . . . . 1.37 1.62 1.48 9 (708497 iters/s) Matrix multiplication . . . . 1.62 1.78 1.38 23 Pike start overhead . . . . . 0.00 0.00 0.00 0 Read binary INT128. . . . . . 4.01 3.71 3.13 16 (3195/s) Read binary INT16 . . . . . . 0.68 0.66 0.64 4 (1570680/s) Read binary INT32 . . . . . . 10.28 10.04 9.15 9 (54645/s)
eiri: 450Mhz P2; 768Mb PC100 SDRAM
test gcc-asm gcc icc gain
Ackermann . . . . . . . . . . 2.19 2.55 2.17 15 Append array. . . . . . . . . 2.11 2.12 2.32 -9 (215517/s) Append mapping. . . . . . . . 14.26 12.67 10.47 18 (955/s) Append multiset . . . . . . . 2.02 2.11 1.98 7 (5059/s) Array & String Juggling . . . 4.56 4.78 5.20 -8 Clone null-object . . . . . . 0.77 0.84 0.72 15 (416667/s) Clone object. . . . . . . . . 1.82 2.20 2.30 -4 (130435/s) Compile . . . . . . . . . . . 4.82 4.39 3.75 15 (6437 lines/s) Compile & Exec. . . . . . . . 5.02 4.78 4.30 10 (139860 lines/s) GC. . . . . . . . . . . . . . 1.86 1.89 1.62 15 Insert in mapping . . . . . . 1.09 1.16 1.01 13 (493827/s) Insert in multiset. . . . . . 3.25 3.53 2.75 23 (181818/s) Loops Nested (global) . . . . 1.60 2.29 2.38 -4 (7049250 iters/s) Loops Nested (local). . . . . 1.00 1.63 1.73 -6 (9679163 iters/s) Loops Recursed. . . . . . . . 1.53 1.77 1.60 10 (655360 iters/s) Matrix multiplication . . . . 1.83 1.86 1.54 18 Pike start overhead . . . . . 0.00 0.00 0.00 25 Read binary INT128. . . . . . 4.11 4.09 3.97 4 (2519/s) Read binary INT16 . . . . . . 0.75 0.72 0.72 1 (1388889/s) Read binary INT32 . . . . . . 10.74 11.42 10.27 11 (48685/s)
ayumu: 2.1Ghz P4 Celeron, 512Mb DDR333
test gcc-asm gcc icc gain
Ackermann . . . . . . . . . . 0.66 0.69 0.55 21 Append array. . . . . . . . . 0.55 0.60 0.51 16 (985222/s) Append mapping. . . . . . . . 2.94 3.06 2.52 18 (3968/s) Append multiset . . . . . . . 0.44 0.45 0.44 3 (22843/s) Array & String Juggling . . . 1.29 1.29 1.35 -4 Clone null-object . . . . . . 0.28 0.26 0.22 16 (1359516/s) Clone object. . . . . . . . . 0.75 0.55 0.45 19 (663391/s) Compile . . . . . . . . . . . 1.47 1.26 1.07 15 (22508 lines/s) Compile & Exec. . . . . . . . 1.37 1.33 1.09 19 (552757 lines/s) GC. . . . . . . . . . . . . . 0.57 0.57 0.47 17 Insert in mapping . . . . . . 0.27 0.30 0.25 19 (2034884/s) Insert in multiset. . . . . . 0.81 0.85 0.73 14 (681818/s) Loops Nested (global) . . . . 0.41 0.68 0.47 32 (36036980 iters/s) Loops Nested (local). . . . . 0.26 0.44 0.33 25 (50423328 iters/s) Loops Recursed. . . . . . . . 0.42 0.55 0.39 30 (2674939 iters/s) Matrix multiplication . . . . 0.51 0.48 0.47 4 Pike start overhead . . . . . 0.00 0.00 0.00 0 Read binary INT128. . . . . . 1.13 1.07 0.94 13 (10638/s) Read binary INT16 . . . . . . 0.20 0.19 0.17 9 (5802047/s) Read binary INT32 . . . . . . 2.86 2.70 2.35 14 (212766/s)
sakura: 1.65Ghz P4, 512Mb PC800 RDRAM
test gcc-asm gcc icc gain
Ackermann . . . . . . . . . . 0.76 0.81 0.62 25 Append array. . . . . . . . . 0.62 0.65 0.60 8 (829876/s) Append mapping. . . . . . . . 3.42 3.76 2.84 25 (3521/s) Append multiset . . . . . . . 0.52 0.57 0.49 14 (20270/s) Array & String Juggling . . . 1.17 1.18 0.88 26 Clone null-object . . . . . . 0.32 0.31 0.27 15 (1111111/s) Clone object. . . . . . . . . 0.62 0.63 0.55 14 (549199/s) Compile . . . . . . . . . . . 1.42 1.29 1.05 19 (22990 lines/s) Compile & Exec. . . . . . . . 1.41 1.33 1.16 13 (518448 lines/s) GC. . . . . . . . . . . . . . 0.58 0.58 0.47 19 Insert in mapping . . . . . . 0.31 0.34 0.27 21 (1871658/s) Insert in multiset. . . . . . 0.87 0.91 0.76 17 (657895/s) Loops Nested (global) . . . . 0.44 0.71 0.58 19 (28802088 iters/s) Loops Nested (local). . . . . 0.32 0.50 0.43 15 (39107724 iters/s) Loops Recursed. . . . . . . . 0.52 0.64 0.46 29 (2279513 iters/s) Matrix multiplication . . . . 0.47 0.47 0.41 13 Pike start overhead . . . . . 0.00 0.00 0.00 75 Read binary INT128. . . . . . 1.22 1.21 1.08 11 (9225/s) Read binary INT16 . . . . . . 0.22 0.22 0.19 13 (5278592/s) Read binary INT32 . . . . . . 3.54 3.46 3.03 13 (165289/s)
/ Per Hedbor ()