Hi Marty,
thanks!
Yes, low_mega_apply still needs to be refactored. It is slightly more
"complicated" because of APPLY_STACK, where the return value will
overwrite the function on the stack. I want to fix the last crash in the
testsuite before refactoring that. If you are interested in working on
those, just let me know so we don't both do it ;)
Adding more perf support would be great, do you have your code in a
branch somewhere? I would be interested to have a look at it.
Arne
On 02/20/17 23:47, Martin Karlgren wrote:
Hi Arne,
That’s awesome!
I’d love to help (with the limited spare time I have.) I guess low_mega_apply should be refactored to make use of the new API too?
Speaking of faster calls, I’ve incidentally been poking around a bit with machine code function calling conventions lately. For profiling purposes (i.e. Linux perf) I’ve added minimal call frame information to Pike functions in the amd64 machine code generator. I’ve gotten to the point where I can start Roxen and get proper stack traces from perf, but the testsuite still fails – it seems related to decoding of dumped bytecode, and I haven’t been able to sort out why.
Anyways, the good thing is that readymade visualisation tools built on perf output can be used to profile Pike code, and the interaction between Pike code and C functions is more apparent.
Examples from a very simple Roxen site being hit by apachebench:
http://marty.se/dotgraph.png <http://marty.se/dotgraph.png> (nodes with a “perf-17628.map” header represent Pike functions)
http://marty.se/flamegraph.svg <http://marty.se/flamegraph.svg> (time on horisontal axis, stack depth on vertical axis).
Hopefully this can be used to weed out where we should start looking for optimisation candidates eventually.
/Marty