Hi Arne,
Great news! I can verify that Roxen seems to run just fine on it, too.
One thing I noticed is that it doesn’t seem to compile --with-debug: the PIKE_NEEDS_TRACE macro (defined in interpret.c) can’t be resolved in interpret_functions.h. I’m not sure where it should reside instead.
I’d definitely vote for a merge to 8.1 – there’s no feature freeze in place yet, is it? After merge, I’d like to rebase/merge the call_frames branch on top of this, unless someone disagrees. That should enable further profiling/optimization iterations.
Best regards, /Marty
On 8 Mar 2017, at 10:33 , Arne Goedeke el@laramies.com wrote:
I think I managed to fix the last issue. I was somehow confusing things and removed the locals from the stack before unlinking the stack frame. This of course broke trampolines. I also ended up rebasing the branch to get rid of the reverts I did at some point.
The current state passes the testsuite (the same tests as 8.1 at least). Performance wise it is roughly where 8.1 is, except for map/automap being significantly faster. There are some slowdowns currently, which are due to me removing some fast paths from the F_CALL_OTHER opcode. I will look into that.
I readded most of the tracing code, however, some of it is unfinished and DTrace is probably broken. I have also not looked at PROFILING, yet, that is probably also not right yet.
Sidenote: Profiling unfortunately does not work properly when fork()ing because timers change. It might even crash when running with debug mode because of that. But that is probably just a bug we need to fix.
Whats currently left on my list before proposing to merge it into 8.1/8.3
- Make sure the map/automap optimizations do not break in pathological
cases (e.g. objects being destructed or similar).
- Maybe think about the API again (e.g. callsite_execute and
callsite_return could be merged. same with callsite_init/callsite_set_args).
Otherwise I played around with adding frame caching to apply_array, which looks promising performance wise. However, it takes some attention to make sure the stack traces are always correct. This would be a good test-case for caching frames in general.
Anyway, feedback welcome, as usual,
Arne