As for threads in the same process, it should in principle be possible to do it without hardly any visible changes in Pike:
1. Let each memory object have a flag to indicate whether it's thread local or global. 2. Thread local things are linked together in thread local link lists. 3. Global objects are protected by a read/write lock. 4. Whenever a thread has to change a global thing it acquires the write lock which would essentially be like the current interpreter lock. 5. When a thread has the write lock it holds it for some time, just like the interpreter lock, so that global changes can be done without lots of locking. 6. When a reference is added from a global to thread local thing, the latter becomes global. This is transitive - at this point it'd be necessary to follow all references in the thread local thing and mark all that as global too. 7. In some cases it'd be necessary to convert global things to thread local (e.g. in a thread queue implementation that's used to dispatch work from a global queue to handler threads). That can be done implicitly if there's only one ref to the thing, but one might want an explicit function to do it. The problem with such a function would be that it has to go through all global things to ensure that there's no global ref anywhere.
The main issues with this are three afaics:
o It's internally quite a big change since there will be many more linked lists, and the operation required in item 6 makes it necessary to fix almost every place add_ref() is used. o The current string implementation makes it impossible to handle thread local strings, and requiring the write lock whenever a new string is created would probably defeat a lot of the parallellism. Thus it's necessary to change the thread implementation to fall back to strncmp when a global and a local string is compared. I guess that this was what Jonas meant with the problematic O(1) property in strings. o Adding and subtracting refs to global things must be done without taking the write lock. On most architectures it ought to be possible to use atomic increment and decrement operations for that.
Neither of these are impossible to overcome, but it would be quite a big change and it would be an incompatible API change for C modules.
At least if I were to make a new language I'd definitely implement a scheme like the above since it automatically makes the threads as separate as possible and I think the transitions of data between the global and local spaces would be fairly few in a reasonably well designed application. It'd be nice to have some debug tools, e.g. to be able to declare an object as thread local which cause an error to be thrown if it becomes referenced from the global data set.
/ Martin Stjernholm, Roxen IS
Previous text:
2004-02-02 06:35: Subject: Re: Default backend and thread backends?
The problem with "the second option" is that it represents a radically different threading model than what Pike is currently using. That means that normal threaded programs using thread_create() would still not be multi-cpu enabled.
Essentially, we would have to design a new API for creating 'threads' and inter-thread communication. If we write this API right, it could be used to communicate between:
o threads running in the same interpreter o threads running in separate interpreters o threads running in separate processes o threads running on separate machines
Personally, I don't need the headache of designing this API :)
/ Fredrik (Naranek) Hubinette (Real Build Master)