Why did I have to do "e 30" to see that message? (Writing this comment so that others that might have missed your message can read it)
/ Martin Nilsson (hehe Torgny)
Previous text:
2002-11-28 23:33: Subject: kcachegrind
I have been doing some benchmarks to see where the time is going in the clone-a-program and call-a-method kind of tests on the shootout.
My current testcase is this one:
class A { static int count; void create() { count = 1; } void tick() { count = !count; } }
void main() { for( int i = 0; i<100000; i++ ) A()->tick(); }
It's relatively close to the test in the language shootout.
My pike is not compiled with assembly support, since that makes it rather impossible to run it under cachegrind. It's not compiled with valgrind support, since that cause the timings to be somewhat different (a lot of marking of memory is done in the block allocator, thus making it seem to use more CPU than it really does.)
My current findings, functions sorted by time consumed in them, not including siblings
Time Function
19% low_mega_apply 16% eval_instruction 8% really_free_pike_frame 5% destruct (SET_FRAME_CONTEXT: 50%) 5% alloc_pike_frame 3% call_c_initializers (SET_FRAME_CONTEXT: 90%) 3% schedule_free_object (add_to_callback: 80%) 2% low_return_pop (pop_n_elems: 99%) 1% free + malloc (from valgrind)
Is it only me, or isn't the numbers for really_free_pike_frame and alloc_pike_frame rather high? Or are there perhaps only too many calls to them?
In this simple test there are 500 000 calls to them. 200k from the create() and tick() calls, 100k from the call_c_initializers() calls and 100k from destruct().
I ponder at the nessesity to create frames for call_c_intializers and destruct when there are no callbacks to call.
/ Per Hedbor ()