-O3

List overview All Threads
Download

newer

older

Whee!

Re: signed integer division

David Hedbor ＠ Pike developers forum

15 Jan 2003 15 Jan '03

9:30 p.m.

Is there a reason while '-O2' is the "max" -Ox flag and not -O3? I.e have there been any specific problems or are there other reasons?

Show replies by date

Dan Nelson

15 Jan 15 Jan

11:47 p.m.

In the last episode (Jan 16), David Hedbor @ Pike developers forum said:

...

Is there a reason while '-O2' is the "max" -Ox flag and not -O3? I.e have there been any specific problems or are there other reasons?

All -O3 does is add -finline_functions and -frename_registers. Inline-functions tends to make the final code bigger than -O2, which could be an issue on machines with small caches, and the docs say that rename_registers can confuse gdb since a variable may exist in different registers over its lifetime. I usually use -O2 and -march or -mcpu (depending on the platform)

-- Dan Nelson dnelson@allantgroup.com

David Hedbor ＠ Pike developers forum

16 Jan 16 Jan

12:05 a.m.

...

In the last episode (Jan 16), David Hedbor @ Pike developers forum said:

...
Is there a reason while '-O2' is the "max" -Ox flag and not -O3? I.e have there been any specific problems or are there other reasons?

All -O3 does is add -finline_functions and -frename_registers. Inline-functions tends to make the final code bigger than -O2, which could be an issue on machines with small caches, and the docs say that rename_registers can confuse gdb since a variable may exist in different registers over its lifetime. I usually use -O2 and -march or -mcpu (depending on the platform)

I'm specifically asking because the Intel compiler has a -O3 which might be worth enabling (even though to be honest it doesn't say that it will help - it might make things slower). From the docs:

-O3 (IA-32 only): Enables -O2 option with more aggressive optimization. Optimizes for maximum speed, but does not guarantee higher performance unless loop and memory access transformation take place. In conjunction with -axK and -xK options (IA-32 only), this option causes the compiler to perform more aggressive data dependency analysis than for -O2. This may result in longer compilation times.

-O3 (Both IA-32 and 64): Enables -O2 option with more aggressive optimization, for example, prefetching, scalar replacement, and loop transformations. Optimizes for maximum speed, but does not guarantee higher performance unless loop and memory access transformation take place.

Now mind you I don't know if that's a (noticeable) benefit or not. :-)

/ David Hedbor

Previous text:

...

2003-01-16 03:48: Subject: Re: -O3

In the last episode (Jan 16), David Hedbor @ Pike developers forum said:

...
Is there a reason while '-O2' is the "max" -Ox flag and not -O3? I.e have there been any specific problems or are there other reasons?

All -O3 does is add -finline_functions and -frename_registers. Inline-functions tends to make the final code bigger than -O2, which could be an issue on machines with small caches, and the docs say that rename_registers can confuse gdb since a variable may exist in different registers over its lifetime. I usually use -O2 and -march or -mcpu (depending on the platform)

-- Dan Nelson dnelson@allantgroup.com

/ Brevbäraren

Martin Stjernholm, Roxen IS ＠ Pike developers forum

12:15 a.m.

Test it? What do they mean with "loop and memory access transformation"?

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-01-16 04:00: Subject: Re: -O3

...
In the last episode (Jan 16), David Hedbor @ Pike developers forum said:

...
Is there a reason while '-O2' is the "max" -Ox flag and not -O3? I.e have there been any specific problems or are there other reasons?

All -O3 does is add -finline_functions and -frename_registers. Inline-functions tends to make the final code bigger than -O2, which could be an issue on machines with small caches, and the docs say that rename_registers can confuse gdb since a variable may exist in different registers over its lifetime. I usually use -O2 and -march or -mcpu (depending on the platform)

I'm specifically asking because the Intel compiler has a -O3 which might be worth enabling (even though to be honest it doesn't say that it will help - it might make things slower). From the docs:

-O3 (IA-32 only): Enables -O2 option with more aggressive optimization. Optimizes for maximum speed, but does not guarantee higher performance unless loop and memory access transformation take place. In conjunction with -axK and -xK options (IA-32 only), this option causes the compiler to perform more aggressive data dependency analysis than for -O2. This may result in longer compilation times.

-O3 (Both IA-32 and 64): Enables -O2 option with more aggressive optimization, for example, prefetching, scalar replacement, and loop transformations. Optimizes for maximum speed, but does not guarantee higher performance unless loop and memory access transformation take place.

Now mind you I don't know if that's a (noticeable) benefit or not. :-)

/ David Hedbor

David Hedbor ＠ Pike developers forum

12:25 a.m.

I will test it. It takes a whole heck of a long time to compile pike with icc though - mainly because of the multi0file IPO linking stage for the main Pike binary (it's done quite a few times and take quite a long time :-). Example of output from that process, in case anyone is interested, are:

/home/neotron/Pike/7.5/src/main.c(188) : (col. 1) remark: main has been targeted for automatic cpu dispatch. /home/neotron/Pike/7.5/src/operators.c(440) : (col. 14) remark: LOOP WAS VECTORIZED. /home/neotron/Pike/7.5/src/operators.c(228) : (col. 1) remark: f_add has been targeted for automatic cpu dispatch. /home/neotron/Pike/7.5/src/multiset.c(1039) : (col. 1) remark: multiset_set_cmp_less has been targeted for automatic cpu dispatch.

/ David Hedbor

Previous text:

...

2003-01-16 04:10: Subject: Re: -O3

Test it? What do they mean with "loop and memory access transformation"?

/ Martin Stjernholm, Roxen IS

Peter Lundqvist (disjunkt) ＠ Pike (-) developers forum

7:40 a.m.

I'm pretty sure that they mean that they transform memmory in order to change loops like

for(k=0; k<large_number; k++) for(i=0; i<large_number2; i++) for(j=0; j<large_number3; j++) foo[i][k] = bar[i][k] * gonk[k][j];

that uses cache locality verry porly into the same loop, but with the matrix gonk transformed so that

for(k=0; k<large_number; k++) for(i=0; i<large_number2; i++) for(j=0; j<large_number3; j++) foo[i][k] = bar[i][k] * gonk[j][k];

which is much better. Prefetching is also something you would want to use, especially in modules that operate on a lot of continous data, like the image module. What it does, if you for some reason don't know is that it "preheats" the cache with data, thus increasing the cache performance. You can actually see a performance increase even with processors that don't support prefetching.

/ Peter Lundqvist (disjunkt)

Previous text:

...

2003-01-16 04:10: Subject: Re: -O3

Test it? What do they mean with "loop and memory access transformation"?

/ Martin Stjernholm, Roxen IS

Dan Nelson

12:28 a.m.

In the last episode (Jan 16), David Hedbor @ Pike developers forum said:

...

...
In the last episode (Jan 16), David Hedbor @ Pike developers forum said:

...
Is there a reason while '-O2' is the "max" -Ox flag and not -O3? I.e have there been any specific problems or are there other reasons?

All -O3 does is add -finline_functions and -frename_registers. Inline-functions tends to make the final code bigger than -O2, which could be an issue on machines with small caches, and the docs say that rename_registers can confuse gdb since a variable may exist in different registers over its lifetime. I usually use -O2 and -march or -mcpu (depending on the platform)

I'm specifically asking because the Intel compiler has a -O3 which might be worth enabling (even though to be honest it doesn't say that it will help - it might make things slower). From the docs:

At this point, you might as well check the compiler vendor and have customized optimization flags. Here's another interesting icc flag:

IA-32 Applications Only -prefetch[-]

Enables or disables prefetch insertion (requires -O3). Reduces wait time; optimum use is determined empirically.

Also, Compaq's CC for Alpha goes up to -O4, for what it's worth.

-- Dan Nelson dnelson@allantgroup.com

8308

Age (days ago)

8308

Last active (days ago)

pike-devel@lists.lysator.liu.se

6 comments

4 participants

tags (0)

participants (4)

Dan Nelson
David Hedbor ＠ Pike developers forum
Martin Stjernholm, Roxen IS ＠ Pike developers forum
Peter Lundqvist (disjunkt) ＠ Pike (-) developers forum