I've been working on a multi-cpu design for Pike during the last couple of months, and now I think it's time to get some criticism on it.
It's not finished, and unfortunately it's a bit long already. I still hope the people around here can find the time to read it through and mull a bit on it, because this hack is going to be rather large. I've tried to explain things briefly so that no knowledge of the pike core internals should be required - most of it is hopefully understandable for anyone who has written some C module code.
The goal is not only to make pike run reasonably on 4-8 cores. It's rather to completely avoid inherent hotspots in the interpreter, so that the level of parallellism only depends on the pike program. I.e. this should scale to 100+ cpu's.
Very briefly, the approach is to divide all data into thread local things that need no locking, and an arbitrary number of "lock spaces". A "lock space" is a monitor lock that controls access to any amount of data - it is up to the programmer to divide the shared data into suitable pieces that are locked independently. Some structures (e.g. arrays and possibly mappings) are lock-free. Refcounting is selectively - or perhaps even completely - disabled to avoid hotspots.
I'll attach the whole thing as a comment to this message. I've also checked it in as multi-cpu.txt in a new branch "multi-cpu" in srb's pikex repository (lyskom people see 16711234).
Btw, merry Christmas! ;)