In the old code, it was not possible to run the callbacks without incurring deadlocks. After the rewrite now, it's deadlock-free.
I tested with and without the call_out() construct.
In my testcases, I see: a. Comparable CPU usage for both cases. b. 6% wall-clock speed improvement when avoiding call_out().
I propose ripping out the call_out() method, so that callouts are ran directly in all cases except for the timeout-case.
Not a good idea:
* Unlimited stack use.
* Potential deadlocks/data inconsistency.
* No serialization of calls.
* No idea of in what context/thread your callback gets called.
/grubba