I was about to change backend.cmod to use power of two hashtables, aswell. Its using hash_svalue anyhow, and it seems a little unnecessary to use mod prime hashtables there. Anyhow, while doing so and putting some benchmarks together I noticed that the code in backend.cmod:990 ff assumes that is_eq has side effets. Because of that it does alot of extra work. Its especially bad if one has alot of callouts to the same function since the buckets get very large.
If a call to remove_call_out/call_out is the only thing that could change the hash tables, we might aswell do some kind of recursion detection and error instead (in case someone does call_outs from `==).
Any thoughts?
arne
Recursion (or rather reentrancy) detection won't do. It's not only about odd code that fiddles with call outs from operators: Since a thread switch can occur anywhere in pike code, it could cause random failures in unrelated code in other threads.
There's not much that can be done against bad hashing if the same function is added a gazillion times, I guess. But otoh the large buckets don't matter except in the specific code you mentioned, right?
Performance aside, it's not good that it can consume an unbounded amount of svalue stack either. At the very least there should be a check_stack() in there somewhere.
I don't see any easy way of getting rid of the loop that copies the whole bucket. But something that ought to mitigate most of the problem would be to only push functions that aren't is_identical to each other, because is_eq should never behave differently for them (we can safely ignore silliness like random() in `==). I reckon simply comparing with the previous entry on the stack should be good enough.
On Thu, 7 Apr 2011, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Recursion (or rather reentrancy) detection won't do. It's not only about odd code that fiddles with call outs from operators: Since a thread switch can occur anywhere in pike code, it could cause random failures in unrelated code in other threads.
yes, you are right. I did not think of threads, only assuming that weird `== and __hash callbacks could be a problem.
There's not much that can be done against bad hashing if the same function is added a gazillion times, I guess. But otoh the large buckets don't matter except in the specific code you mentioned, right?
Yes, I think its the only case. I guess that having many call_outs to only a few different functions is not so uncommon. Doing a lot of find_call_out/remove_call_out calls might be rare.
Performance aside, it's not good that it can consume an unbounded amount of svalue stack either. At the very least there should be a check_stack() in there somewhere.
Hm, since bucket size is not known beforehand, it might be necessary to keep track of that.
I don't see any easy way of getting rid of the loop that copies the whole bucket. But something that ought to mitigate most of the problem would be to only push functions that aren't is_identical to each other, because is_eq should never behave differently for them (we can safely ignore silliness like random() in `==). I reckon simply comparing with the previous entry on the stack should be good enough.
Yes, that sounds good. So then, along your lines I would propose the following patch. It resolves the tests I have done (1000x speedup).
diff --git a/src/backend.cmod b/src/backend.cmod index e016b7b..7c2c308 100644 --- a/src/backend.cmod +++ b/src/backend.cmod @@ -966,6 +966,7 @@ PIKECLASS Backend return c->pos; } } + return -1; } /* Note: is_eq() may call Pike code (which we want), * but Pike code may change the hash tables... @@ -981,6 +982,14 @@ PIKECLASS Backend if(CALL(c->pos) != c) Pike_fatal("Call_out->pos not correct!\n"); #endif + /* We dont need to check any others. These two + * will be equal. + */ + if (is_identical(fun, ITEM(c->args))) { + pop_n_elems(Pike_sp - save_sp); + return c->pos; + } + push_int(c->pos); push_svalue(ITEM(c->args)); }
As always when talking performance: I appriciate if new benchmarks are added to the benchmark target. Preferably long before doing any optimizations.
Yes, you are right. I added two benchmark tests to cover both call_out handling using ids and without. The current benchmark produces a case with a lot of collisions, so the results are definitely biased towards the optimization proposed before.
On Thu, 7 Apr 2011, Peter Bortas @ Pike developers forum wrote:
As always when talking performance: I appriciate if new benchmarks are added to the benchmark target. Preferably long before doing any optimizations.
Well, you can bias the other way by adding a call out benchmark using lambdas, right?
Yes, thats right. Its the "standard" case which would not benefit much from the proposed optimization since buckets should be small. There is another extreme case, having objects with `() lfun being all equal with same __hash(). That would be the worst case and the optimization would not help at all.
Regarding benchmarks: The last column in the output seems meaningless in many benchmarks. Since its a float before printing, maybe some precision should be added.
Are those benchmarks run automatically somewhere, maybe xenoclient?
On Thu, 7 Apr 2011, Martin Nilsson (Opera Mini - AFK!) @ Pike (-) developers forum wrote:
Well, you can bias the other way by adding a call out benchmark using lambdas, right?
Regarding benchmarks: The last column in the output seems meaningless in many benchmarks. Since its a float before printing, maybe some precision should be added.
Yes. The number of iterations should perhaps be incresed in some of the tests too.
Are those benchmarks run automatically somewhere, maybe xenoclient?
They are run automatically on all Xenofarm clients. See for example http://pikefarm.lysator.liu.se/parsed/7.9/31_4/benchmark.txt
If someone wants to write a script to parse the banchmark output to produce graphs that would be welcome.
I'm currently rewriting the server end to take advantage of some git features, but the work in progress can be found here: http://pikefarm.lysator.liu.se/devel.html
Fair warning: The database is likely to be wiped at some time the coming weeks.
Yes, I think its the only case. I guess that having many call_outs to only a few different functions is not so uncommon. Doing a lot of find_call_out/remove_call_out calls might be rare.
The reasonable thing would be to have the buckets in sorted order so that they could stop searching at the first hit. But read on..
Performance aside, it's not good that it can consume an unbounded amount of svalue stack either. At the very least there should be a check_stack() in there somewhere.
Hm, since bucket size is not known beforehand, it might be necessary to keep track of that.
Yes, so the check_stack call must be inside the loop. One can always make it run only every 100 iterations or so, but I don't know if it's worth the bother. It's pretty quick.
@@ -966,6 +966,7 @@ PIKECLASS Backend return c->pos; } }
return -1; } /* Note: is_eq() may call Pike code (which we want),
but Pike code may change the hash tables...
Not safe since an array can actually work as a callable:
call_out (({lambda() {werror ("oy!\n");}}), 0);
(1) Result: ({ /* 1 element */ ({ /* 1 element */ HilfeInput()->__lambda_65794_0_line_1 }) })
start backend oy!
@@ -981,6 +982,14 @@ PIKECLASS Backend if(CALL(c->pos) != c) Pike_fatal("Call_out->pos not correct!\n"); #endif
/* We dont need to check any others. These two
* will be equal.
*/
if (is_identical(fun, ITEM(c->args))) {
pop_n_elems(Pike_sp - save_sp);
return c->pos;
}
}push_int(c->pos); push_svalue(ITEM(c->args));
Interesting, but it wasn't what I had in mind. That one would consume just as much stack if is_eq actually is required to match the function, or if the function really doesn't match (although an unlucky hash collision would be required for that). What I meant was something like this (untested):
@@ -981,6 +981,9 @@ PIKECLASS Backend if(CALL(c->pos) != c) Pike_fatal("Call_out->pos not correct!\n"); #endif + if (Pike_sp > save_sp && + is_identical (Pike_sp - 1, ITEM (c->args))) + pop_n_elems (2); push_int(c->pos); push_svalue(ITEM(c->args)); }
Note that this replaces the previous entry rather than skipping the current one. That's to keep compat with the current behavior where it runs is_eq from the top of the stack, i.e. from the end of the bucket. That's also something your optimization misses.
But now when I look closer at the buckets, I notice that new call outs are added to the beginning, and this function searches from the end. So if the same function is added multiple times with call out times in reverse order then (find|remove)_call_out will find the last function in the queue rather than the first one. This test program verifies that:
int start_t; void co_func (int id) { werror ("id %d, t %d\n", id, time() - start_t); } int main() { werror ("Adding forwards\n"); for (int i = 0; i <= 2; i++) call_out (co_func, i, i); start_t = time(); remove_call_out (co_func); while (Pike.DefaultBackend.get_stats()->num_call_outs) Pike.DefaultBackend (0); werror ("Adding backwards\n"); for (int i = 2; i >= 0; i--) call_out (co_func, i, i); start_t = time(); remove_call_out (co_func); while (Pike.DefaultBackend.get_stats()->num_call_outs) Pike.DefaultBackend (0); }
I'd say that's a bug, or at least it doesn't agree with the documentation. :(
It also looks like a rehash shuffles the buckets a bit.
On Thu, 7 Apr 2011, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
@@ -966,6 +966,7 @@ PIKECLASS Backend return c->pos; } }
return -1; } /* Note: is_eq() may call Pike code (which we want),
but Pike code may change the hash tables...
Not safe since an array can actually work as a callable:
call_out (({lambda() {werror ("oy!\n");}}), 0);
(1) Result: ({ /* 1 element */ ({ /* 1 element */ HilfeInput()->__lambda_65794_0_line_1 }) })
start backend oy!
It should still be found in the first loop then. In both cases the pointer is used as the hash value. Its only that since my change in hash_svalue it wont anymore, since the extra mixing is only used on the function pointer. I pushed my changes for power of two hash tables using the same hashing for ids and functions again to the main git in the branch arne/pow2.
[...]
But now when I look closer at the buckets, I notice that new call outs are added to the beginning, and this function searches from the end. So if the same function is added multiple times with call out times in reverse order then (find|remove)_call_out will find the last function in the queue rather than the first one. This test program verifies that:
[...]
I'd say that's a bug, or at least it doesn't agree with the documentation. :(
Yes, I noticed that and hence assumed that the order does not matter. If it does, my optimization is bad. On the other hand, if the buckets are properly sorted, we could do both is_identical checks and either return or push if the current one is not identical to the top on the stack.
It also looks like a rehash shuffles the buckets a bit.
This is something that would be fixed by power of two hash tables, since the code only seems to increase the hash table size. And then, buckets are only split, never shuffled.
It should still be found in the first loop then. In both cases the pointer is used as the hash value. Its only that since my change in hash_svalue it wont anymore, since the extra mixing is only used on the function pointer.
It's not only that, it's also that in the first loop it compares with c->args, while in the second it compares with c->args->item[0].
Yes, I noticed that and hence assumed that the order does not matter. If it does, my optimization is bad. On the other hand, if the buckets are properly sorted, we could do both is_identical checks and either return or push if the current one is not identical to the top on the stack.
Since backend_find_call_out has this bad O(n) complexity, where n is the number of times the function has been queued, I think it could just as well look through all matches and pick the one with the lowest tv value. But it can still do both is_identical optimizations.
The alternative, to keep the buckets sorted, is a rather more meaty change. It's not just a matter of sorting them either, because that'd make new_call_out O(n) instead, which would be worse. One would have to change to a different data structure, like one binary heap per bucket.
It also looks like a rehash shuffles the buckets a bit.
This is something that would be fixed by power of two hash tables, since the code only seems to increase the hash table size. And then, buckets are only split, never shuffled.
I don't think it matters in this case, since it doesn't visit the items in bucket order. It uses the binary heap afaics.
pike-devel@lists.lysator.liu.se