In the old code, it was not possible to run the callbacks without incurring deadlocks. After the rewrite now, it's deadlock-free.
I tested with and without the call_out() construct.
In my testcases, I see: a. Comparable CPU usage for both cases. b. 6% wall-clock speed improvement when avoiding call_out().
I propose ripping out the call_out() method, so that callouts are ran directly in all cases except for the timeout-case.
Any objections?