pike_frames vs. clone_object/destruct. (7.5)

List overview All Threads
Download

newer

older

CHANGES

rsqld

Per Hedbor () ＠ Pike (-) developers forum

18 Dec 2002 18 Dec '02

7:15 a.m.

I have added two new tests to pike -x benchmark, added the --tests='Glob here' argument to be above mentioned program to be able to run only the tests I want to, and then optimized cloning the null pike-class about 20%.

Also, cloning a non-null pike class (one with a create method that actually does something and a local variable) is also faster.

Before:

test total user mem (runs) --------------------------------------------------------- Clone null-object.......... 0.502s 0.421s 2936kb (10) Clone object............... 0.800s 0.761s 2936kb (7)

After:

test total user mem (runs) --------------------------------------------------------- Clone null-object.......... 0.360s 0.326s 2948kb (14) Clone object............... 0.688s 0.651s 2948kb (8)

On a related note, the same optimizations could rather easily be done to gc_check_object, gc_mark_object_as_referenced and real_gc_cycle_check_object.

The key to the whole thing is that there were a lot of struct pike_frame:s created, initialized, linked and deinitialized unessearily, and this took quite a lot of time.

Show replies by date

David Hedbor ＠ Pike developers forum

18 Dec 18 Dec

7:15 a.m.

BTW, for the benchmark it would be nice to get a "iterations per second" measurement or perhaps "ms per iteration". That way it's easier to compare. Or perhaps the number shown is one iteration already? Hmm. That might make sense I guess. :-P

/ David Hedbor

Previous text:

...

2002-12-17 20:56: Subject: pike_frames vs. clone_object/destruct. (7.5)

I have added two new tests to pike -x benchmark, added the --tests='Glob here' argument to be above mentioned program to be able to run only the tests I want to, and then optimized cloning the null pike-class about 20%.

Also, cloning a non-null pike class (one with a create method that actually does something and a local variable) is also faster.

Before:

test total user mem (runs)

Clone null-object.......... 0.502s 0.421s 2936kb (10) Clone object............... 0.800s 0.761s 2936kb (7)

After:

test total user mem (runs)

Clone null-object.......... 0.360s 0.326s 2948kb (14) Clone object............... 0.688s 0.651s 2948kb (8)

On a related note, the same optimizations could rather easily be done to gc_check_object, gc_mark_object_as_referenced and real_gc_cycle_check_object.

The key to the whole thing is that there were a lot of struct pike_frame:s created, initialized, linked and deinitialized unessearily, and this took quite a lot of time.

/ Per Hedbor ()

Per Hedbor () ＠ Pike (-) developers forum

7:15 a.m.

It is.

/ Per Hedbor ()

Previous text:

...

2002-12-17 21:04: Subject: pike_frames vs. clone_object/destruct. (7.5)

BTW, for the benchmark it would be nice to get a "iterations per second" measurement or perhaps "ms per iteration". That way it's easier to compare. Or perhaps the number shown is one iteration already? Hmm. That might make sense I guess. :-P

/ David Hedbor

Mirar ＠ Pike developers forum

7:55 a.m.

I've thought of adding a reporting of "n", so you can get the number of operations per second. This is most useful for the operations with a clear inner loop, for instance in this case and the loop test cases.

/ Mirar

Previous text:

...

2002-12-17 21:04: Subject: pike_frames vs. clone_object/destruct. (7.5)

BTW, for the benchmark it would be nice to get a "iterations per second" measurement or perhaps "ms per iteration". That way it's easier to compare. Or perhaps the number shown is one iteration already? Hmm. That might make sense I guess. :-P

/ David Hedbor

Martin Stjernholm, Roxen IS ＠ Pike developers forum

7:20 a.m.

What kept you from doing them in the gc functions?

I wonder if one can be a little naughty and allocate the pike_frames on the stack in those functions. One thing I don't understand though is why most functions carefully avoids having the extra ref during most of the frame's lifetime.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2002-12-17 20:56: Subject: pike_frames vs. clone_object/destruct. (7.5)

I have added two new tests to pike -x benchmark, added the --tests='Glob here' argument to be above mentioned program to be able to run only the tests I want to, and then optimized cloning the null pike-class about 20%.

Also, cloning a non-null pike class (one with a create method that actually does something and a local variable) is also faster.

Before:

test total user mem (runs)

Clone null-object.......... 0.502s 0.421s 2936kb (10) Clone object............... 0.800s 0.761s 2936kb (7)

After:

test total user mem (runs)

Clone null-object.......... 0.360s 0.326s 2948kb (14) Clone object............... 0.688s 0.651s 2948kb (8)

On a related note, the same optimizations could rather easily be done to gc_check_object, gc_mark_object_as_referenced and real_gc_cycle_check_object.

The key to the whole thing is that there were a lot of struct pike_frame:s created, initialized, linked and deinitialized unessearily, and this took quite a lot of time.

/ Per Hedbor ()

Per Hedbor () ＠ Pike (-) developers forum

7:20 a.m.

Mostly I'm lazy. I also thought that someone more involved in the GC-code should do that.

That might be possible. It's not the actual allocation that is most expensive, though, it's the initialization of the frames. Also, the accesses of prog, storage and similar members through pike_frame instead of a local variable generated extra memory operations, gcc did not really optimize that code all that well.

/ Per Hedbor ()

Previous text:

...

2002-12-17 22:33: Subject: pike_frames vs. clone_object/destruct. (7.5)

What kept you from doing them in the gc functions?

I wonder if one can be a little naughty and allocate the pike_frames on the stack in those functions. One thing I don't understand though is why most functions carefully avoids having the extra ref during most of the frame's lifetime.

/ Martin Stjernholm, Roxen IS

Martin Stjernholm, Roxen IS ＠ Pike developers forum

7:20 a.m.

If it doesn't optimize common subexpressions well then there's a whole lot of trivial optimizations we can do; long access chains are very common in the pike core.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2002-12-17 22:53: Subject: pike_frames vs. clone_object/destruct. (7.5)

Mostly I'm lazy. I also thought that someone more involved in the GC-code should do that.

That might be possible. It's not the actual allocation that is most expensive, though, it's the initialization of the frames. Also, the accesses of prog, storage and similar members through pike_frame instead of a local variable generated extra memory operations, gcc did not really optimize that code all that well.

/ Per Hedbor ()

Per Hedbor () ＠ Pike (-) developers forum

7:20 a.m.

It seems to fail to do that when there are function calls of some kind between the accesses. I would say that that is a feature, not a misfeature in gcc.

However, code like

if(pike_frame->context.prog->event_handler) pike_frame->context.prog->event_handler(PROG_EVENT_GC_RECURSE);

for(q=0;q<(int)pike_frame->context.prog->num_variable_index;q++) { int d=pike_frame->context.prog->variable_index[q]; if(IDENTIFIER_IS_ALIAS(pike_frame->context.prog->identifiers[d]. identifier_flags)) { ... gc_mark_svalues( s, 1 ); ... } }

is not really all that optimal, since the memory access in the for-loop seems to be done once for each loop.

2e09: 8b 73 4c mov 0x4c(%ebx),%esi 2e0c: 83 c4 10 add $0x10,%esp 2e0f: 89 f1 mov %esi,%ecx 2e11: 47 inc %edi 2e12: 0f b7 41 6e movzwl 0x6e(%ecx),%eax 2e16: 39 c7 cmp %eax,%edi 2e18: 7c a6 jl 2dc0 <gc_mark_object_as_referenced+0x1c0>

/ Per Hedbor ()

Previous text:

...

2002-12-17 23:34: Subject: pike_frames vs. clone_object/destruct. (7.5)

If it doesn't optimize common subexpressions well then there's a whole lot of trivial optimizations we can do; long access chains are very common in the pike core.

/ Martin Stjernholm, Roxen IS

Martin Stjernholm, Roxen IS ＠ Pike developers forum

7:20 a.m.

gcc would have to do global optimizations to be able to do better, I guess. Still, there are lots of places which are spoiled by function calls.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2002-12-17 23:47: Subject: pike_frames vs. clone_object/destruct. (7.5)

It seems to fail to do that when there are function calls of some kind between the accesses. I would say that that is a feature, not a misfeature in gcc.

However, code like

if(pike_frame->context.prog->event_handler) pike_frame->context.prog->event_handler(PROG_EVENT_GC_RECURSE);

for(q=0;q<(int)pike_frame->context.prog->num_variable_index;q++) { int d=pike_frame->context.prog->variable_index[q]; if(IDENTIFIER_IS_ALIAS(pike_frame->context.prog->identifiers[d]. identifier_flags)) { ... gc_mark_svalues( s, 1 ); ... } }

is not really all that optimal, since the memory access in the for-loop seems to be done once for each loop.

2e09: 8b 73 4c mov 0x4c(%ebx),%esi 2e0c: 83 c4 10 add $0x10,%esp 2e0f: 89 f1 mov %esi,%ecx 2e11: 47 inc %edi 2e12: 0f b7 41 6e movzwl 0x6e(%ecx),%eax 2e16: 39 c7 cmp %eax,%edi 2e18: 7c a6 jl 2dc0 <gc_mark_object_as_referenced+0x1c0>

/ Per Hedbor ()

8265

Age (days ago)

8265

Last active (days ago)

pike-devel@lists.lysator.liu.se

8 comments

4 participants

tags (0)

participants (4)

David Hedbor ＠ Pike developers forum
Martin Stjernholm, Roxen IS ＠ Pike developers forum
Mirar ＠ Pike developers forum
Per Hedbor () ＠ Pike (-) developers forum