I have looked into this exact problem a while ago. One thing I noticed, which might be relevant for your code (which I have not looked at too closely, yet), is that in case of tail recursion frames can be replaced. This might be something that your code needs to handle (for instance when calling a recursive function with map).
When I attempted to improve the performance of function calls (and in particular map/automap/filter/...) I started by refactoring the API for handling frames and doing function calls. I came up with something like frame_prepare(), frame_execute(), frame_return(), frame_pop(), etc.
I kept all the apply variants as fallbacks, which simply call the right combination of these new functions. I then implemented the map optimization similar to the one that you did. At some point I mentioned this work on the list (or maybe also just in person) but I never pushed it anywhere. I also never got around to finish this, but it is something I would still like to find some spare time for. I will push my branch now, maybe it is of interest.
Arne
On 04/06/16 00:12, Martin Karlgren wrote:
Hi,
Having realised the benefits of functional programming, I’ve been quite annoyed by the rumour of how expensive function calls are in Pike. I decided to look into f_map and could see how much seemingly unnecessary work it does when it calls apply_svalue once for each entry in the array – it should be possible to reuse the pike_frame if it’s about to be thrown away after the function call (if it has other refs on the other hand, it can’t be reused – it’s probably used as a scope in some other frame).
I’ve pushed my optimised variant in marty/optimised_map – it seems to work quite well and provides a major speedup. In fact, it’s a bit faster than the corresponding foreach variant. I haven’t verified correctness in various corner cases, and some input on whether it’s correct to do the things init_frame_reuse_context does only once before multiple function calls would be nice too. The *_reuse_context stuff in interpret.c should be applicable wherever the same svalue is applied repeatedly with the same number of arguments (I haven’t looked for it outside of f_map really).
What do you all think? Good idea or did I overlook something?
Without optimisation: map: 1.660797 array index: 1.335115 array append: 1.17917
With optimisation: map: 0.877659 array index: 1.351158 array append: 1.189812
Test program:
int main() { array base = allocate(10000000, 1);
float gmap = gauge { array res = map (base, lambda(int i) { return i + 2; }); };
float garrayindex = gauge { array res = allocate(sizeof(base)); foreach (base; int idx; int i) { res[i] = i + 2; } };
float garrayappend = gauge { array res = ({}); foreach (base, int i) { res += ({ i + 2 }); } };
werror ("map: %O\n", gmap); werror ("array index: %O\n", garrayindex); werror ("array append: %O\n", garrayappend); }
/Marty