Very interesting. However, I don't see what th_yield() has to do with anything. Unlocking a mutex that someone else is waiting for should automatically yield in a correct thread implementation.
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2002-09-14 03:47: Subject: Frequent context switches
I've made a couple of observations:
o clock() on Linux is about 30% slower than gethrtime on Solaris (reasonable since clock() measures thread local cpu time while gethrtime measures real time). clock() on Solaris is 50% slower than gethrtime. My conclusion: Try to use gethrtime if it exists, with fallback to clock. Since their speed is on the same order of magnitude one should be equally cautious about calling either of them.
o Many of the calls to C functions are not made through mega_apply; some of them become special opcodes and other are made through the "call builtin" opcodes which doesn't have any check_threads_etc calls. So it might not be so reasonable afterall to assume that there always is a low ratio of slow C function calls in mega_apply.
o Even only a fast_call_threads_etc(1) gives a speed improvement compared to call_threads_etc(). That's regardless whether there's a clock/gethrtime check in check_threads or not.
o If there's no working yield then it's easy to get starvation since a context switch rarely happens by itself in the unlocked window in check_threads. That means that it could be disastrous to have the once-every-256 divisor check or a clock/gethrtime check which short-circuits the unlocked window. I've checked in a patch that disables them completely if th_yield() doesn't expand to a function.
(I discovered this by accident; there was a bug in pike_threadlib.h that blocked the fallback to thr_yield if the POSIX thread lib is chosen and there's no pthread_yield. This happened on at least Solaris 7.)
o Using clock or gethrtime to cover the yield call gives a comparatively small speed improvement. E.g. on sol7-x86 I got a test loop 42% faster by limiting the number of context switches from 41000 to 20 per second. With the simple divisor check first the gap shrinks to 230 - 20 context switches per second, which translates to only about 10% speed increase.
I expected the clock/gethrtime calls to be a lot cheaper than a context switch, but that's apparently not the case. Perhaps it's some effect in my small test case that causes the context switches to be uncharacteristically fast, I don't know. Anyway, the clock/gethrtime check gives some speed improvement so it should be used unless the divisor is tuned better. But I think that'd be too risky since there's no telling how much the call rate to check_threads can vary.
/ Martin Stjernholm, Roxen IS