I have pushed a new version of the faster calls branch. I readded support for profiling. I also fixed one more open bug. Of course there may more more. Performance-wise the situation did not change since the conference, but I have done some experimentation to improve it. One follow-up change will probably be a re-work of the function call API used from within interpreter. I also want to add more tail call optimizations where missing. Current comparison to 8.1 on my intel:
interpreter/automap.pike#efun | 27.8 M 0.4 % | 41.1 M 0.6 % | interpreter/automap.pike#private | 17.3 M 0.7 % | 25.8 M 0.3 % | interpreter/automap.pike#public | 17.6 M 0.2 % | 26.1 M 0.1 % | interpreter/call.pike#efun |173.9 M 0.6 % |171.6 M 0.8 % | interpreter/call.pike#index | 15.7 M 0.4 % | 14.3 M 0.4 % | interpreter/call.pike#private | 20.9 M 0.5 % | 18.7 M 0.2 % | interpreter/call.pike#public | 21.1 M 0.2 % | 18.6 M 0.4 % | interpreter/call_array.pike#efun | 23.7 M 0.1 % | 34.1 M 0.3 % | interpreter/call_array.pike#private | 15.6 M 0.3 % | 14.1 M 0.1 % | interpreter/call_array.pike#public | 15.3 M 0.3 % | 14.0 M 0.1 % | interpreter/map.pike#efun | 41.8 M 0.4 % | 37.0 M 0.5 % | interpreter/map.pike#private | 15.7 M 0.5 % | 22.9 M 0.4 % | interpreter/map.pike#public | 16.0 M 0.1 % | 22.9 M 0.2 % | interpreter/recurse.pike#private | 27.9 M 0.4 % | 38.1 M 0.3 % | interpreter/recurse.pike#public | 27.7 M 0.2 % | 37.2 M 0.7 % | interpreter/tailcall.pike#private | 20.6 M 0.4 % | 31.4 M 0.4 % | interpreter/tailcall.pike#public | 21.0 M 0.2 % | 30.7 M 0.5 % |
As you can see, the only slow-downs currently are in standard lfun calls. I believe this can be fixed using some refactoring of the corresponding opcodes. I believe that we can also come up with a simpler API which can more easily be called directly from machine code.
I think the branch is now in a state in which it could be merged into 8.1. I expect that there are still some issues which will come up when running more complex code bases. I did write a test-suite which tries to cover all different function call types and I will integrate that into the standard testsuite soon.
Unless anyone sees any issue with the code as it stands, I will use the decision reached during the conference and merge this within the next week or so.
Arne
Pretty much the same as 8.1. All three branches (8.0, 8.1, faster_calls). Relative standard deviation removed for readability.
interpreter/automap.pike#efun | 26.3M | 25.5M | 39.0M | interpreter/automap.pike#private | 17.7M | 16.0M | 24.3M | interpreter/automap.pike#public | 17.7M | 16.2M | 24.7M | interpreter/call.pike#efun |169.9M |165.0M |166.2M | interpreter/call.pike#index | 15.0M | 15.2M | 13.8M | interpreter/call.pike#private | 20.8M | 20.3M | 17.5M | interpreter/call.pike#public | 20.8M | 20.0M | 18.2M | interpreter/call_array.pike#efun | 23.0M | 22.3M | 32.8M | interpreter/call_array.pike#private| 15.9M | 15.8M | 13.6M | interpreter/call_array.pike#public | 16.2M | 15.4M | 13.4M | interpreter/map.pike#efun | 35.8M | 39.7M | 35.8M | interpreter/map.pike#private | 15.5M | 15.1M | 22.0M | interpreter/map.pike#public | 15.6M | 15.1M | 21.9M | interpreter/recurse.pike#private | 26.7M | 26.7M | 36.4M | interpreter/recurse.pike#public | 26.7M | 26.4M | 35.8M | interpreter/tailcall.pike#private | 21.1M | 20.4M | 29.7M | interpreter/tailcall.pike#public | 21.2M | 20.3M | 29.5M |
Arne
On 11/22/17 22:49, Martin Nilsson (Coppermist) @ Pike (-) developers forum wrote:
How does it compare against 8.0?
Arne Goedeke wrote:
Pretty much the same as 8.1. All three branches (8.0, 8.1, faster_calls). Relative standard deviation removed for readability.
interpreter/automap.pike#efun | 26.3M | 25.5M | 39.0M |
Any explanation why, in general, 8.1 is consistently slightly slower than 8.0? Are we doing more work in 8.1?
My first guess would be that it is related to the recent addition of the save_locals bitmask which is used to free locals from outer scope earlier when using lambdas. It makes the call frame struct slightly bigger and also adds some additional work that needs to be done on function calls. Of course this change has other important benefits.
There were also some other changes to the call frames, but I would hope most of those are benefitial performance wise.
On 11/23/17 10:14, Stephen R. van den Berg wrote:
Arne Goedeke wrote:
Pretty much the same as 8.1. All three branches (8.0, 8.1, faster_calls). Relative standard deviation removed for readability.
interpreter/automap.pike#efun | 26.3M | 25.5M | 39.0M |
Any explanation why, in general, 8.1 is consistently slightly slower than 8.0? Are we doing more work in 8.1?
pike-devel@lists.lysator.liu.se