Is there a reason while '-O2' is the "max" -Ox flag and not -O3? I.e have there been any specific problems or are there other reasons?
In the last episode (Jan 16), David Hedbor @ Pike developers forum said:
Is there a reason while '-O2' is the "max" -Ox flag and not -O3? I.e have there been any specific problems or are there other reasons?
All -O3 does is add -finline_functions and -frename_registers. Inline-functions tends to make the final code bigger than -O2, which could be an issue on machines with small caches, and the docs say that rename_registers can confuse gdb since a variable may exist in different registers over its lifetime. I usually use -O2 and -march or -mcpu (depending on the platform)
In the last episode (Jan 16), David Hedbor @ Pike developers forum said:
Is there a reason while '-O2' is the "max" -Ox flag and not -O3? I.e have there been any specific problems or are there other reasons?
All -O3 does is add -finline_functions and -frename_registers. Inline-functions tends to make the final code bigger than -O2, which could be an issue on machines with small caches, and the docs say that rename_registers can confuse gdb since a variable may exist in different registers over its lifetime. I usually use -O2 and -march or -mcpu (depending on the platform)
I'm specifically asking because the Intel compiler has a -O3 which might be worth enabling (even though to be honest it doesn't say that it will help - it might make things slower). From the docs:
-O3 (IA-32 only): Enables -O2 option with more aggressive optimization. Optimizes for maximum speed, but does not guarantee higher performance unless loop and memory access transformation take place. In conjunction with -axK and -xK options (IA-32 only), this option causes the compiler to perform more aggressive data dependency analysis than for -O2. This may result in longer compilation times.
-O3 (Both IA-32 and 64): Enables -O2 option with more aggressive optimization, for example, prefetching, scalar replacement, and loop transformations. Optimizes for maximum speed, but does not guarantee higher performance unless loop and memory access transformation take place.
Now mind you I don't know if that's a (noticeable) benefit or not. :-)
/ David Hedbor
Previous text:
2003-01-16 03:48: Subject: Re: -O3
In the last episode (Jan 16), David Hedbor @ Pike developers forum said:
Is there a reason while '-O2' is the "max" -Ox flag and not -O3? I.e have there been any specific problems or are there other reasons?
All -O3 does is add -finline_functions and -frename_registers. Inline-functions tends to make the final code bigger than -O2, which could be an issue on machines with small caches, and the docs say that rename_registers can confuse gdb since a variable may exist in different registers over its lifetime. I usually use -O2 and -march or -mcpu (depending on the platform)
-- Dan Nelson dnelson@allantgroup.com
/ Brevbäraren
Test it? What do they mean with "loop and memory access transformation"?
/ Martin Stjernholm, Roxen IS
Previous text:
2003-01-16 04:00: Subject: Re: -O3
In the last episode (Jan 16), David Hedbor @ Pike developers forum said:
Is there a reason while '-O2' is the "max" -Ox flag and not -O3? I.e have there been any specific problems or are there other reasons?
All -O3 does is add -finline_functions and -frename_registers. Inline-functions tends to make the final code bigger than -O2, which could be an issue on machines with small caches, and the docs say that rename_registers can confuse gdb since a variable may exist in different registers over its lifetime. I usually use -O2 and -march or -mcpu (depending on the platform)
I'm specifically asking because the Intel compiler has a -O3 which might be worth enabling (even though to be honest it doesn't say that it will help - it might make things slower). From the docs:
-O3 (IA-32 only): Enables -O2 option with more aggressive optimization. Optimizes for maximum speed, but does not guarantee higher performance unless loop and memory access transformation take place. In conjunction with -axK and -xK options (IA-32 only), this option causes the compiler to perform more aggressive data dependency analysis than for -O2. This may result in longer compilation times.
-O3 (Both IA-32 and 64): Enables -O2 option with more aggressive optimization, for example, prefetching, scalar replacement, and loop transformations. Optimizes for maximum speed, but does not guarantee higher performance unless loop and memory access transformation take place.
Now mind you I don't know if that's a (noticeable) benefit or not. :-)
/ David Hedbor
I will test it. It takes a whole heck of a long time to compile pike with icc though - mainly because of the multi0file IPO linking stage for the main Pike binary (it's done quite a few times and take quite a long time :-). Example of output from that process, in case anyone is interested, are:
/home/neotron/Pike/7.5/src/main.c(188) : (col. 1) remark: main has been targeted for automatic cpu dispatch. /home/neotron/Pike/7.5/src/operators.c(440) : (col. 14) remark: LOOP WAS VECTORIZED. /home/neotron/Pike/7.5/src/operators.c(228) : (col. 1) remark: f_add has been targeted for automatic cpu dispatch. /home/neotron/Pike/7.5/src/multiset.c(1039) : (col. 1) remark: multiset_set_cmp_less has been targeted for automatic cpu dispatch.
/ David Hedbor
Previous text:
2003-01-16 04:10: Subject: Re: -O3
Test it? What do they mean with "loop and memory access transformation"?
/ Martin Stjernholm, Roxen IS
I'm pretty sure that they mean that they transform memmory in order to change loops like
for(k=0; k<large_number; k++) for(i=0; i<large_number2; i++) for(j=0; j<large_number3; j++) foo[i][k] = bar[i][k] * gonk[k][j];
that uses cache locality verry porly into the same loop, but with the matrix gonk transformed so that
for(k=0; k<large_number; k++) for(i=0; i<large_number2; i++) for(j=0; j<large_number3; j++) foo[i][k] = bar[i][k] * gonk[j][k];
which is much better. Prefetching is also something you would want to use, especially in modules that operate on a lot of continous data, like the image module. What it does, if you for some reason don't know is that it "preheats" the cache with data, thus increasing the cache performance. You can actually see a performance increase even with processors that don't support prefetching.
/ Peter Lundqvist (disjunkt)
Previous text:
2003-01-16 04:10: Subject: Re: -O3
Test it? What do they mean with "loop and memory access transformation"?
/ Martin Stjernholm, Roxen IS
In the last episode (Jan 16), David Hedbor @ Pike developers forum said:
In the last episode (Jan 16), David Hedbor @ Pike developers forum said:
Is there a reason while '-O2' is the "max" -Ox flag and not -O3? I.e have there been any specific problems or are there other reasons?
All -O3 does is add -finline_functions and -frename_registers. Inline-functions tends to make the final code bigger than -O2, which could be an issue on machines with small caches, and the docs say that rename_registers can confuse gdb since a variable may exist in different registers over its lifetime. I usually use -O2 and -march or -mcpu (depending on the platform)
I'm specifically asking because the Intel compiler has a -O3 which might be worth enabling (even though to be honest it doesn't say that it will help - it might make things slower). From the docs:
At this point, you might as well check the compiler vendor and have customized optimization flags. Here's another interesting icc flag:
IA-32 Applications Only -prefetch[-]
Enables or disables prefetch insertion (requires -O3). Reduces wait time; optimum use is determined empirically.
Also, Compaq's CC for Alpha goes up to -O4, for what it's worth.
pike-devel@lists.lysator.liu.se