Hi,
Having realised the benefits of functional programming, I’ve been quite annoyed by the rumour of how expensive function calls are in Pike. I decided to look into f_map and could see how much seemingly unnecessary work it does when it calls apply_svalue once for each entry in the array – it should be possible to reuse the pike_frame if it’s about to be thrown away after the function call (if it has other refs on the other hand, it can’t be reused – it’s probably used as a scope in some other frame).
I’ve pushed my optimised variant in marty/optimised_map – it seems to work quite well and provides a major speedup. In fact, it’s a bit faster than the corresponding foreach variant. I haven’t verified correctness in various corner cases, and some input on whether it’s correct to do the things init_frame_reuse_context does only once before multiple function calls would be nice too. The *_reuse_context stuff in interpret.c should be applicable wherever the same svalue is applied repeatedly with the same number of arguments (I haven’t looked for it outside of f_map really).
What do you all think? Good idea or did I overlook something?
Without optimisation: map: 1.660797 array index: 1.335115 array append: 1.17917
With optimisation: map: 0.877659 array index: 1.351158 array append: 1.189812
Test program:
int main() { array base = allocate(10000000, 1);
float gmap = gauge { array res = map (base, lambda(int i) { return i + 2; }); };
float garrayindex = gauge { array res = allocate(sizeof(base)); foreach (base; int idx; int i) { res[i] = i + 2; } };
float garrayappend = gauge { array res = ({}); foreach (base, int i) { res += ({ i + 2 }); } };
werror ("map: %O\n", gmap); werror ("array index: %O\n", garrayindex); werror ("array append: %O\n", garrayappend); }
/Marty
On Wed, Apr 6, 2016 at 8:12 AM, Martin Karlgren marty@roxen.com wrote:
Having realised the benefits of functional programming, I’ve been quite annoyed by the rumour of how expensive function calls are in Pike. I decided to look into f_map and could see how much seemingly unnecessary work it does when it calls apply_svalue once for each entry in the array – it should be possible to reuse the pike_frame if it’s about to be thrown away after the function call (if it has other refs on the other hand, it can’t be reused – it’s probably used as a scope in some other frame).
Crucially, you're looking at optimizing this code.
float gmap = gauge { array res = map (base, lambda(int i) { return i + 2; }); };
For comparison, Python handles this with a list comprehension, replacing this:
res = list(map(lambda i: i + 2, base))
with this:
res = [i + 2 for i in base]
This compiles to a single operation that loops over the base and constructs a new list.
I think your optimization is a good one to do (although I can't speak to Pike's internals to prove correctness), but what I'd like to see is a way to not need the lambda function at all. Automap syntax is great for simple calls like this, so I added it to your timing test:
float gautomap = gauge { array res = base[*] + 2; };
Without the optimization:
$ bin/pike ../mapspeed.pike automap: 0.230016378 map: 0.327757117 array index: 0.256829669 array append: 0.339447457
With optimization:
$ bin/pike ../mapspeed.pike automap: 0.256595633 map: 0.189304826 array index: 0.24930893 array append: 0.314897908
Interestingly, your optimization actually comes out faster than automap. I'm not surprised it's faster than array append, and even the array index one (it's quicker to do the whole job in C), but automap would normally be my go-to answer for stuff like this.
So yeah, I'd support this 100%, unless it can be proven to be incorrect in some way.
ChrisA
I threw your branch on the Pike-experimental Coverity stream. You might want to have a look at CID 1358354:
*** CID 1358354: Null pointer dereferences (FORWARD_NULL) /home/covscan/pike/pike-git/src/builtin.cmod: 4280 in low_automap() 4274 push_svalue(ITEM(tmpargs[e].u.array)+x); 4275 }else{ 4276 push_svalue(tmpargs+e); 4277 } 4278 } 4279
CID 1358354: Null pointer dereferences (FORWARD_NULL) Comparing "reuse_ctx" to null implies that "reuse_ctx" might be null.
4280 if(reuse_ctx != NULL) 4281 apply_svalue_reuse_context (reuse_ctx); 4282 else 4283 low_automap(d+1,depth,fun,real_args,args); 4284 stack_pop_to_no_free (ITEM(ret) + x); 4285 types |= 1 << TYPEOF(ITEM(ret)[x]);
Well, that was a stupid mistake. Thanks. /Marty
On 07 Apr 2016, at 02:25 , Peter Bortas @ Pike developers forum 10353@lyskom.lysator.liu.se wrote:
I threw your branch on the Pike-experimental Coverity stream. You might want to have a look at CID 1358354:
*** CID 1358354: Null pointer dereferences (FORWARD_NULL) /home/covscan/pike/pike-git/src/builtin.cmod: 4280 in low_automap() 4274 push_svalue(ITEM(tmpargs[e].u.array)+x); 4275 }else{ 4276 push_svalue(tmpargs+e); 4277 } 4278 } 4279
CID 1358354: Null pointer dereferences (FORWARD_NULL) Comparing "reuse_ctx" to null implies that "reuse_ctx" might be null.
4280 if(reuse_ctx != NULL) 4281 apply_svalue_reuse_context (reuse_ctx); 4282 else 4283 low_automap(d+1,depth,fun,real_args,args); 4284 stack_pop_to_no_free (ITEM(ret) + x); 4285 types |= 1 << TYPEOF(ITEM(ret)[x]);
I have looked into this exact problem a while ago. One thing I noticed, which might be relevant for your code (which I have not looked at too closely, yet), is that in case of tail recursion frames can be replaced. This might be something that your code needs to handle (for instance when calling a recursive function with map).
When I attempted to improve the performance of function calls (and in particular map/automap/filter/...) I started by refactoring the API for handling frames and doing function calls. I came up with something like frame_prepare(), frame_execute(), frame_return(), frame_pop(), etc.
I kept all the apply variants as fallbacks, which simply call the right combination of these new functions. I then implemented the map optimization similar to the one that you did. At some point I mentioned this work on the list (or maybe also just in person) but I never pushed it anywhere. I also never got around to finish this, but it is something I would still like to find some spare time for. I will push my branch now, maybe it is of interest.
Arne
On 04/06/16 00:12, Martin Karlgren wrote:
Hi,
Having realised the benefits of functional programming, I’ve been quite annoyed by the rumour of how expensive function calls are in Pike. I decided to look into f_map and could see how much seemingly unnecessary work it does when it calls apply_svalue once for each entry in the array – it should be possible to reuse the pike_frame if it’s about to be thrown away after the function call (if it has other refs on the other hand, it can’t be reused – it’s probably used as a scope in some other frame).
I’ve pushed my optimised variant in marty/optimised_map – it seems to work quite well and provides a major speedup. In fact, it’s a bit faster than the corresponding foreach variant. I haven’t verified correctness in various corner cases, and some input on whether it’s correct to do the things init_frame_reuse_context does only once before multiple function calls would be nice too. The *_reuse_context stuff in interpret.c should be applicable wherever the same svalue is applied repeatedly with the same number of arguments (I haven’t looked for it outside of f_map really).
What do you all think? Good idea or did I overlook something?
Without optimisation: map: 1.660797 array index: 1.335115 array append: 1.17917
With optimisation: map: 0.877659 array index: 1.351158 array append: 1.189812
Test program:
int main() { array base = allocate(10000000, 1);
float gmap = gauge { array res = map (base, lambda(int i) { return i + 2; }); };
float garrayindex = gauge { array res = allocate(sizeof(base)); foreach (base; int idx; int i) { res[i] = i + 2; } };
float garrayappend = gauge { array res = ({}); foreach (base, int i) { res += ({ i + 2 }); } };
werror ("map: %O\n", gmap); werror ("array index: %O\n", garrayindex); werror ("array append: %O\n", garrayappend); }
/Marty
Ah, very interesting. Now that you mention it, I recall seeing something about this on the list but apparently I failed to recall it before implementing something on my own… ;) Your refactoring looks much more appealing though, so I’ll definitely vote for your branch. What’s left to be sorted out?
Another idea I had, which I don’t know whether it’s feasible or not, was to populate a frame cache in the parent frame when a function is likely to be called repeatedly (e.g. inside foreach) – but let’s sort out the basic stuff first...
(I have very little spare time because of kids, but every once in a while I’ll make a “frustration hack” ;) )
/Marty
On 07 Apr 2016, at 10:49 , Arne Goedeke el@laramies.com wrote:
I have looked into this exact problem a while ago. One thing I noticed, which might be relevant for your code (which I have not looked at too closely, yet), is that in case of tail recursion frames can be replaced. This might be something that your code needs to handle (for instance when calling a recursive function with map).
When I attempted to improve the performance of function calls (and in particular map/automap/filter/...) I started by refactoring the API for handling frames and doing function calls. I came up with something like frame_prepare(), frame_execute(), frame_return(), frame_pop(), etc.
I kept all the apply variants as fallbacks, which simply call the right combination of these new functions. I then implemented the map optimization similar to the one that you did. At some point I mentioned this work on the list (or maybe also just in person) but I never pushed it anywhere. I also never got around to finish this, but it is something I would still like to find some spare time for. I will push my branch now, maybe it is of interest.
Arne
On 04/06/16 00:12, Martin Karlgren wrote:
Hi,
Having realised the benefits of functional programming, I’ve been quite annoyed by the rumour of how expensive function calls are in Pike. I decided to look into f_map and could see how much seemingly unnecessary work it does when it calls apply_svalue once for each entry in the array – it should be possible to reuse the pike_frame if it’s about to be thrown away after the function call (if it has other refs on the other hand, it can’t be reused – it’s probably used as a scope in some other frame).
I’ve pushed my optimised variant in marty/optimised_map – it seems to work quite well and provides a major speedup. In fact, it’s a bit faster than the corresponding foreach variant. I haven’t verified correctness in various corner cases, and some input on whether it’s correct to do the things init_frame_reuse_context does only once before multiple function calls would be nice too. The *_reuse_context stuff in interpret.c should be applicable wherever the same svalue is applied repeatedly with the same number of arguments (I haven’t looked for it outside of f_map really).
What do you all think? Good idea or did I overlook something?
Without optimisation: map: 1.660797 array index: 1.335115 array append: 1.17917
With optimisation: map: 0.877659 array index: 1.351158 array append: 1.189812
Test program:
int main() { array base = allocate(10000000, 1);
float gmap = gauge { array res = map (base, lambda(int i) { return i + 2; }); };
float garrayindex = gauge { array res = allocate(sizeof(base)); foreach (base; int idx; int i) { res[i] = i + 2; } };
float garrayappend = gauge { array res = ({}); foreach (base, int i) { res += ({ i + 2 }); } };
werror ("map: %O\n", gmap); werror ("array index: %O\n", garrayindex); werror ("array append: %O\n", garrayappend); }
/Marty
If you feel like frustration hacking on that branch, you are very welcome!
My plan would be something like that:
1) Rebase to current 8.1 (will do that now) 2) Fix the current state. I do not remember what the problem really was, but I do recall that it does not pass the testsuite. 3) Cleanup 4) Optimize 5) New features. With a better API it might be feasible to implement continuations and possibly other things. Its worth a discussion. By the way, when is the next pike conference?
Some details: 3) - As I mentioned in my previous email, the tail recursion optimization creates a new frame and then replaces the previous one on the stack. This could be improved by resetting the current frame, so that execution resumes at the beginning of the function. I suspect that it works the way it does currently because the previous API makes it hard to do the call _without_ setting up a new frame. - That new branch introduces frame->type, which did not previously exist. It makes struct pike_frame even bigger than it already is. Not all members of the frame struct are used for all frame types, so it would make sense to place some of them inside of a union.
4) - Use the new API in more places like filter and automap. - I also really like your idea of caching the last frame. I was thinking about something similar, except that I did not think of using the frame for that, but instead cache it globally. Also, in many situations it is actually not enough to cache only one frame, instead we need 2 or more:
foreach (foo; mixed key; mixed val) bar();
During one iteration, this calls both next() in the iterator of foo and bar(). Its probably a good idea to experiment with that.
Arne
On 04/11/16 20:41, Martin Karlgren wrote:
Ah, very interesting. Now that you mention it, I recall seeing something about this on the list but apparently I failed to recall it before implementing something on my own… ;) Your refactoring looks much more appealing though, so I’ll definitely vote for your branch. What’s left to be sorted out?
Another idea I had, which I don’t know whether it’s feasible or not, was to populate a frame cache in the parent frame when a function is likely to be called repeatedly (e.g. inside foreach) – but let’s sort out the basic stuff first...
(I have very little spare time because of kids, but every once in a while I’ll make a “frustration hack” ;) )
/Marty
I have been able to fix a couple of bugs in the branch by now. There are some left, which I could use some help to understand why they occur. One is triggered by the DHT.test in the bittorrent library. When running that test in the 'faster_calls' branch with debug enabled, the following fatal occurs:
------ /home/el/code/rw/pike/src/gc.c:3864: GC fatal: GC destructed parent prematurely. **Block: 0x7f0d52594180 Type: object Refs: 5 **Got gc marker at 0x753e010: flags=0x20001e refs=4 weak=2 xrefs=0 saved=4 frame=0x61b1d80 [data=0x7f0d52594180 rf_flags=0x95 prev=(nil) next=0x61b1d48 cycle_id=0x61b1d80 cycle_piece=0xffffffffffffffff link_top/last_cycle_piece=0xffffffffffffffff] **Parent identifier: 20 **Program id: 66099 **Object variables: ** rtt: mixed name: state off: 56 value: 1 ** rtt: mixed name: ping_fails off: 72 value: 0 ** rtt: mixed name: last_ping off: 88 value: 1461419197 ** rtt: mapping name: pings_inflight off: 104 value: mapping of size 0 ** rtt: mixed name: last_response off: 112 value: 1461419362 ** rtt: mixed name: check_node_callout off: 128 value: array of size 1 ** === In inherit "Node", program 66100: ** rtt: string name: node_id off: 16 value: "O\366\355\207%n\277j4fI\323\231;\332|9Vu\30" ** rtt: string name: address off: 24 value: "127.0.0.1" ** rtt: string name: token off: 32 value: "\370\373S\252""5\336\322\25\312[\205\201qR3h]\235z\311" ** rtt: mixed name: port off: 40 value: 7010 **Describing program 0x20fd070 of object: **Got gc marker at 0x76c8cb0: flags=0x0000e refs=15106 weak=0 xrefs=0 saved=15106 frame=(nil) **Program id: 66099, flags: 308f, parent id: 66097 **Location: /home/el/local/pike/8.1.4/lib/modules/Protocols.pmod/Bittorrent.pmod/DHT.pike:601 **Object got a parent. ******************* Pike was in GC stage 400 when this fatal occurred. Backtrace at time of fatal:
Subprocess died of signal SIGABRT. ------
So apparently the parent object was destructed before its child. Does anyone have any ideas what parts of the function call part in interpret.c might be incorrect? Is the PARENT_CLONE case wrong?
Thanks
Arne
This fatal is actually also happening in usual pike 8.1, so i guess its unrelated to the new branch. The fatal happens during cleanup on exit, should that not make sure that things are cleaned up in the correct order? Or should the check be disabled on the final gc run when all objects are freed?
arne
On 04/23/16 16:10, Arne Goedeke wrote:
I have been able to fix a couple of bugs in the branch by now. There are some left, which I could use some help to understand why they occur. One is triggered by the DHT.test in the bittorrent library. When running that test in the 'faster_calls' branch with debug enabled, the following fatal occurs:
/home/el/code/rw/pike/src/gc.c:3864: GC fatal: GC destructed parent prematurely. **Block: 0x7f0d52594180 Type: object Refs: 5 **Got gc marker at 0x753e010: flags=0x20001e refs=4 weak=2 xrefs=0 saved=4 frame=0x61b1d80 [data=0x7f0d52594180 rf_flags=0x95 prev=(nil) next=0x61b1d48 cycle_id=0x61b1d80 cycle_piece=0xffffffffffffffff link_top/last_cycle_piece=0xffffffffffffffff] **Parent identifier: 20 **Program id: 66099 **Object variables: ** rtt: mixed name: state off: 56 value: 1 ** rtt: mixed name: ping_fails off: 72 value: 0 ** rtt: mixed name: last_ping off: 88 value: 1461419197 ** rtt: mapping name: pings_inflight off: 104 value: mapping of size 0 ** rtt: mixed name: last_response off: 112 value: 1461419362 ** rtt: mixed name: check_node_callout off: 128 value: array of size 1 ** === In inherit "Node", program 66100: ** rtt: string name: node_id off: 16 value: "O\366\355\207%n\277j4fI\323\231;\332|9Vu\30" ** rtt: string name: address off: 24 value: "127.0.0.1" ** rtt: string name: token off: 32 value: "\370\373S\252""5\336\322\25\312[\205\201qR3h]\235z\311" ** rtt: mixed name: port off: 40 value: 7010 **Describing program 0x20fd070 of object: **Got gc marker at 0x76c8cb0: flags=0x0000e refs=15106 weak=0 xrefs=0 saved=15106 frame=(nil) **Program id: 66099, flags: 308f, parent id: 66097 **Location: /home/el/local/pike/8.1.4/lib/modules/Protocols.pmod/Bittorrent.pmod/DHT.pike:601 **Object got a parent.
Pike was in GC stage 400 when this fatal occurred. Backtrace at time of fatal:
Subprocess died of signal SIGABRT.
So apparently the parent object was destructed before its child. Does anyone have any ideas what parts of the function call part in interpret.c might be incorrect? Is the PARENT_CLONE case wrong?
Thanks
Arne
Hi Arne,
I'm curious about what you think it would take to merge the faster_calls branch to 8.1? Do you know about any outstanding issues or is it mostly about rebasing it to 8.1 head and running the testsuite?
Thanks, Marty
24 apr. 2016 kl. 16:10 skrev Arne Goedeke el@laramies.com:
This fatal is actually also happening in usual pike 8.1, so i guess its unrelated to the new branch. The fatal happens during cleanup on exit, should that not make sure that things are cleaned up in the correct order? Or should the check be disabled on the final gc run when all objects are freed?
arne
On 04/23/16 16:10, Arne Goedeke wrote: I have been able to fix a couple of bugs in the branch by now. There are some left, which I could use some help to understand why they occur. One is triggered by the DHT.test in the bittorrent library. When running that test in the 'faster_calls' branch with debug enabled, the following fatal occurs:
/home/el/code/rw/pike/src/gc.c:3864: GC fatal: GC destructed parent prematurely. **Block: 0x7f0d52594180 Type: object Refs: 5 **Got gc marker at 0x753e010: flags=0x20001e refs=4 weak=2 xrefs=0 saved=4 frame=0x61b1d80 [data=0x7f0d52594180 rf_flags=0x95 prev=(nil) next=0x61b1d48 cycle_id=0x61b1d80 cycle_piece=0xffffffffffffffff link_top/last_cycle_piece=0xffffffffffffffff] **Parent identifier: 20 **Program id: 66099 **Object variables: ** rtt: mixed name: state off: 56 value: 1 ** rtt: mixed name: ping_fails off: 72 value: 0 ** rtt: mixed name: last_ping off: 88 value: 1461419197 ** rtt: mapping name: pings_inflight off: 104 value: mapping of size 0 ** rtt: mixed name: last_response off: 112 value: 1461419362 ** rtt: mixed name: check_node_callout off: 128 value: array of size 1 ** === In inherit "Node", program 66100: ** rtt: string name: node_id off: 16 value: "O\366\355\207%n\277j4fI\323\231;\332|9Vu\30" ** rtt: string name: address off: 24 value: "127.0.0.1" ** rtt: string name: token off: 32 value: "\370\373S\252""5\336\322\25\312[\205\201qR3h]\235z\311" ** rtt: mixed name: port off: 40 value: 7010 **Describing program 0x20fd070 of object: **Got gc marker at 0x76c8cb0: flags=0x0000e refs=15106 weak=0 xrefs=0 saved=15106 frame=(nil) **Program id: 66099, flags: 308f, parent id: 66097 **Location: /home/el/local/pike/8.1.4/lib/modules/Protocols.pmod/Bittorrent.pmod/DHT.pike:601 **Object got a parent.
Pike was in GC stage 400 when this fatal occurred. Backtrace at time of fatal:
Subprocess died of signal SIGABRT.
So apparently the parent object was destructed before its child. Does anyone have any ideas what parts of the function call part in interpret.c might be incorrect? Is the PARENT_CLONE case wrong?
Thanks
Arne
Hi Marty,
I have recently looked into the branch again. I am not sure if there were any bugs when we last left it. The main issue I currently see with the code (and that is my fault) is that it allocates a frame for efun calls. This does not happen in pike currently and I suppose we want to keep it that way. The current API in the new branch unfortunately requires this for keeping the information about the frame type.
I have recently tried to rebase that branch onto current pike 8.1 in order to attempt another refactoring to address that particular problem. However, that did not go very well in the end because I encountered huge merge conflicts and gave up. I am a bit undecided at the moment how we should go forward with this.
My current feeling is that the new API is broken and should be replaced by something which:
1) optimizes lookup and frame allocation for map/filter/automap 2) does not allocate a frame for efun calls 3) allows easy integration of frame caching ... n) continuations?
I have written down some notes last time I looked at this, but its all very early stage right now.
On the other hand, maybe what we want to achieve for 8.1 is only the first part. Maybe we can come up with some kind of intermediate version which does that?
Arne
On Mon, 9 Jan 2017, Martin Karlgren wrote:
Hi Arne,
I'm curious about what you think it would take to merge the faster_calls branch to 8.1? Do you know about any outstanding issues or is it mostly about rebasing it to 8.1 head and running the testsuite?
Thanks, Marty
24 apr. 2016 kl. 16:10 skrev Arne Goedeke el@laramies.com:
This fatal is actually also happening in usual pike 8.1, so i guess its unrelated to the new branch. The fatal happens during cleanup on exit, should that not make sure that things are cleaned up in the correct order? Or should the check be disabled on the final gc run when all objects are freed?
arne
On 04/23/16 16:10, Arne Goedeke wrote: I have been able to fix a couple of bugs in the branch by now. There are some left, which I could use some help to understand why they occur. One is triggered by the DHT.test in the bittorrent library. When running that test in the 'faster_calls' branch with debug enabled, the following fatal occurs:
/home/el/code/rw/pike/src/gc.c:3864: GC fatal: GC destructed parent prematurely. **Block: 0x7f0d52594180 Type: object Refs: 5 **Got gc marker at 0x753e010: flags=0x20001e refs=4 weak=2 xrefs=0 saved=4 frame=0x61b1d80 [data=0x7f0d52594180 rf_flags=0x95 prev=(nil) next=0x61b1d48 cycle_id=0x61b1d80 cycle_piece=0xffffffffffffffff link_top/last_cycle_piece=0xffffffffffffffff] **Parent identifier: 20 **Program id: 66099 **Object variables: ** rtt: mixed name: state off: 56 value: 1 ** rtt: mixed name: ping_fails off: 72 value: 0 ** rtt: mixed name: last_ping off: 88 value: 1461419197 ** rtt: mapping name: pings_inflight off: 104 value: mapping of size 0 ** rtt: mixed name: last_response off: 112 value: 1461419362 ** rtt: mixed name: check_node_callout off: 128 value: array of size 1 ** === In inherit "Node", program 66100: ** rtt: string name: node_id off: 16 value: "O\366\355\207%n\277j4fI\323\231;\332|9Vu\30" ** rtt: string name: address off: 24 value: "127.0.0.1" ** rtt: string name: token off: 32 value: "\370\373S\252""5\336\322\25\312[\205\201qR3h]\235z\311" ** rtt: mixed name: port off: 40 value: 7010 **Describing program 0x20fd070 of object: **Got gc marker at 0x76c8cb0: flags=0x0000e refs=15106 weak=0 xrefs=0 saved=15106 frame=(nil) **Program id: 66099, flags: 308f, parent id: 66097 **Location: /home/el/local/pike/8.1.4/lib/modules/Protocols.pmod/Bittorrent.pmod/DHT.pike:601 **Object got a parent.
Pike was in GC stage 400 when this fatal occurred. Backtrace at time of fatal:
Subprocess died of signal SIGABRT.
So apparently the parent object was destructed before its child. Does anyone have any ideas what parts of the function call part in interpret.c might be incorrect? Is the PARENT_CLONE case wrong?
Thanks
Arne
Hi,
Great work!
A thought on the API – could frame_init perhaps be removed from the “public” API and have frame_setup_from_* perform that internally instead (and return the new frame)? That way a frame cache could be introduced transparently later on, by having frame_setup_from_* perform cache lookups based on the object/function offset etc. instead of creating a new frame.
Yes, caching multiple frame entries sounds desirable. If the cache is placed/referenced by the parent frame, even a multi-level cache should be possible:
void fie(int i) { } void bar(int i) { fie(i + 1); } void foo() { foreach(enumerate(1000000), int i) bar(i); }
Here, the F_FOREACH handler could set a PIKE_FRAME_DO_CACHE flag in the current frame, which would enable caching of the “bar” call. But the flag could also be propagated to the child frame, so that the “fie” call is cached in the frame handling the execution of “bar”. When execution leaves the scope that initiated the caching, the caches should be cleaned up recursively. I’m not sure whether this would work from a practical standpoint, but it might be worth considering. Thoughts?
/Marty
On 12 Apr 2016, at 18:48 , Arne Goedeke el@laramies.com wrote:
- Use the new API in more places like filter and automap.
- I also really like your idea of caching the last frame. I was
thinking about something similar, except that I did not think of using the frame for that, but instead cache it globally. Also, in many situations it is actually not enough to cache only one frame, instead we need 2 or more:
foreach (foo; mixed key; mixed val) bar();
During one iteration, this calls both next() in the iterator of foo and bar(). Its probably a good idea to experiment with that.
Hi Martin,
sorry for the late response. Some thoughts:
1) I think the current API is not perfect. But aside from the implicit caching idea there is some places where caching happens explicitly. In those cases it would still be necessary to have an API which uses a given frame and does the setup without allocation. Those cases also fall into different cases, in some it is known that the same function will be called (f_map, f_filter, etc) and those where it could be a different function (tail recursion, etc). In the latter the current frame can only be reused when calling into pike code, which adds another complication. 2) I think the pike_frame struct should be smaller. My patch currently also added another pointer, which is not really needed. I did this mostly for clarity and was planning to clean it up later. It might also help to seperate the different frame types into several structs and place them into a union. This would also take care of places in the code where uninitialized values are used from the frames, mainly in the backtrace handling code. The different pointers to the stack could also be implemented using an offset from the save_sp.
Maybe a good idea would be to first add the caching using a new API. Also, I think interpreter struct might be a good place to do the caching. This avoids increasing the size of frames and also make the amount of cached frames a bit more predictable.
I would also be glad to have some feedback from people who are more familiar with the interpreter.
Arne
On 05/02/16 22:44, Martin Karlgren wrote:
Hi,
Great work!
A thought on the API – could frame_init perhaps be removed from the “public” API and have frame_setup_from_* perform that internally instead (and return the new frame)? That way a frame cache could be introduced transparently later on, by having frame_setup_from_* perform cache lookups based on the object/function offset etc. instead of creating a new frame.
Yes, caching multiple frame entries sounds desirable. If the cache is placed/referenced by the parent frame, even a multi-level cache should be possible:
void fie(int i) { } void bar(int i) { fie(i + 1); } void foo() { foreach(enumerate(1000000), int i) bar(i); }
Here, the F_FOREACH handler could set a PIKE_FRAME_DO_CACHE flag in the current frame, which would enable caching of the “bar” call. But the flag could also be propagated to the child frame, so that the “fie” call is cached in the frame handling the execution of “bar”. When execution leaves the scope that initiated the caching, the caches should be cleaned up recursively. I’m not sure whether this would work from a practical standpoint, but it might be worth considering. Thoughts?
/Marty
On 12 Apr 2016, at 18:48 , Arne Goedeke el@laramies.com wrote:
- Use the new API in more places like filter and automap.
- I also really like your idea of caching the last frame. I was
thinking about something similar, except that I did not think of using the frame for that, but instead cache it globally. Also, in many situations it is actually not enough to cache only one frame, instead we need 2 or more:
foreach (foo; mixed key; mixed val) bar();
During one iteration, this calls both next() in the iterator of foo and bar(). Its probably a good idea to experiment with that.
pike-devel@lists.lysator.liu.se