Hi,
Is there any way to _change_ default backend? Something like Pike.DefaultBackend->replace(MyBackend) - so all subsequent calls to call_out() and all file descriptors allocated after this will use new backend? And more... It would be nice to have dedicated backends in some threads, so (say) all calls to call_out() and fd's allocation will use thread-specific backend (if it is set - and default otherwise). Something like Thread.Thread()->set_backend()...
All this functionality can be simulated(?) - like intercepting Stdio.File|Stdio.FILE object creation (to use Backend()->add_file()), call_out()s etc., but it wouldn't be as clean as "native" support (there is always chance to overlook something). As to "Why?" - well, I simply don't want to (over)load single backend with callbacks from everywhere... Especially when I've SMP machine - it won't be used efficiently with single backend. Any comments? Regards, /Al
Well, for call_outs it can be done:
add_constant("call_out", my_backend->call_out); add_constant("_do_call_outs", my_backend->_do_call_outs); add_constant("find_call_out", my_backend->find_call_out); add_constant("remove_call_out", my_backend->remove_call_out); add_constant("call_out_info", my_backend->call_out_info);
(It will only affect code compiled after you run these lines of course.)
/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)
Previous text:
2004-01-23 05:54: Subject: Default backend and thread backends?
Hi,
Is there any way to _change_ default backend? Something like Pike.DefaultBackend->replace(MyBackend) - so all subsequent calls to call_out() and all file descriptors allocated after this will use new backend?
And more... It would be nice to have dedicated backends in some threads, so (say) all calls to call_out() and fd's allocation will use thread-specific backend (if it is set - and default otherwise). Something like Thread.Thread()->set_backend()...
All this functionality can be simulated(?) - like intercepting Stdio.File|Stdio.FILE object creation (to use Backend()->add_file()), call_out()s etc., but it wouldn't be as clean as "native" support (there is always chance to overlook something).
As to "Why?" - well, I simply don't want to (over)load single backend with callbacks from everywhere... Especially when I've SMP machine - it won't be used efficiently with single backend.
Any comments?
Regards, /Al
/ Brevbäraren
An SMP machine won't be used efficiently anyway because of the interpreter lock.
To me it seems a bit odd to have an implicit mapping between threads and backends; I'd probably use explicit backend objects or maybe round robin if I'm worried about having too many fd's in the same backend(*). In any case, it's simple to make your own dispatcher. Something like this:
class ThreadToBackendDispatcher { mapping(Thread.Thread:PikeBackend) map = ([]);
Stdio.File make_stdio_file_obj() { Stdio.File f = Stdio.File(); f->set_backend (map[this_thread()]); return f; }
void call_out (function f, int delay, mixed... args) { map[this_thread()]->call_out (f, delay, @args); }
// Etc... }
*) Grubba has lately been working a bit on using /dev/poll or similar (/dev/epoll on Linux). Where that method works, many fd's in the same backend shouldn't be a factor and so there's no reason at all to split the fd's into several backends. Right now it should work with 7.5 on Solaris if I'm not mistaken. I don't know how far he has gotten with the Linux support.
/ Martin Stjernholm, Roxen IS
Previous text:
2004-01-23 05:54: Subject: Default backend and thread backends?
Hi,
Is there any way to _change_ default backend? Something like Pike.DefaultBackend->replace(MyBackend) - so all subsequent calls to call_out() and all file descriptors allocated after this will use new backend?
And more... It would be nice to have dedicated backends in some threads, so (say) all calls to call_out() and fd's allocation will use thread-specific backend (if it is set - and default otherwise). Something like Thread.Thread()->set_backend()...
All this functionality can be simulated(?) - like intercepting Stdio.File|Stdio.FILE object creation (to use Backend()->add_file()), call_out()s etc., but it wouldn't be as clean as "native" support (there is always chance to overlook something).
As to "Why?" - well, I simply don't want to (over)load single backend with callbacks from everywhere... Especially when I've SMP machine - it won't be used efficiently with single backend.
Any comments?
Regards, /Al
/ Brevbäraren
On Sun, Feb 01, 2004 at 12:05:01AM +0100, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
An SMP machine won't be used efficiently anyway because of the interpreter lock.
Does it mean that multithreading in Pike is not effective and should be avoided if possible?
To me it seems a bit odd to have an implicit mapping between threads and backends;
What I want is to make sure that nothing in default backend will block application execution, especially callouts. Sure, I can do everyhing explicitly, but I've no control over external modules, which might be added later.
Also, I want that my application will be scalable in a way that I just can instaniate a copy of some object (which runs entire app) in separate thread, and it won't interfere with anything else (I mean, no lockouts etc) running in another threads (use of SMP would be ideal, but you say that it won't be used efficiently)...
Regards, /Al
An SMP machine won't be used efficiently anyway because of the interpreter lock.
Does it mean that multithreading in Pike is not effective and should be avoided if possible?
No. It means that Pike code will only be executed in one thread at a time, never on two or more processors simultaneously.
If you have heavy operations outside the Pike code (Image stuff, for instance), it will use the second processor, though.
/ Mirar
Previous text:
2004-02-01 08:13: Subject: Re: Default backend and thread backends?
On Sun, Feb 01, 2004 at 12:05:01AM +0100, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
An SMP machine won't be used efficiently anyway because of the interpreter lock.
Does it mean that multithreading in Pike is not effective and should be avoided if possible?
To me it seems a bit odd to have an implicit mapping between threads and backends;
What I want is to make sure that nothing in default backend will block application execution, especially callouts. Sure, I can do everyhing explicitly, but I've no control over external modules, which might be added later.
Also, I want that my application will be scalable in a way that I just can instaniate a copy of some object (which runs entire app) in separate thread, and it won't interfere with anything else (I mean, no lockouts etc) running in another threads (use of SMP would be ideal, but you say that it won't be used efficiently)...
Regards, /Al
/ Brevbäraren
On Sun, Feb 01, 2004 at 08:20:01AM +0100, Mirar @ Pike developers forum wrote:
If you have heavy operations outside the Pike code (Image stuff, for instance), it will use the second processor, though.
So, basically, everything that is external (C modules) is not affected?
Well, then another question - is it planned to make Pike really multithreaded, so Pike code will be executed simultaneously?
Regards, /Al
So, basically, everything that is external (C modules) is not affected?
Yes. While Pike is executing Pike code, it locks the Big Main Pike Interpreter Lock. This is process-global. When it stops doing that the lock is released, and heavy C functions do release it. (Not all, since it takes some computrons to release and lock.)
Well, then another question - is it planned to make Pike really multithreaded, so Pike code will be executed simultaneously?
I don't know. I don't think anyone has really felt the need for that.
/ Mirar
Previous text:
2004-02-01 09:06: Subject: Re: Default backend and thread backends?
On Sun, Feb 01, 2004 at 08:20:01AM +0100, Mirar @ Pike developers forum wrote:
If you have heavy operations outside the Pike code (Image stuff, for instance), it will use the second processor, though.
So, basically, everything that is external (C modules) is not affected?
Well, then another question - is it planned to make Pike really multithreaded, so Pike code will be executed simultaneously?
Regards, /Al
/ Brevbäraren
Well, the need has been there for a long time (i.e. since we started using multiprocessor machines). Since my knowledge of the internals is close to zero I can't predict the steps needed to achieve it, though. Per-thread string pools? Explicit disabling of thread switching in Pike code that relies on the interpreter lock? Incremental gc?
/ Jonas Walldén
Previous text:
2004-02-01 09:13: Subject: Re: Default backend and thread backends?
So, basically, everything that is external (C modules) is not affected?
Yes. While Pike is executing Pike code, it locks the Big Main Pike Interpreter Lock. This is process-global. When it stops doing that the lock is released, and heavy C functions do release it. (Not all, since it takes some computrons to release and lock.)
Well, then another question - is it planned to make Pike really multithreaded, so Pike code will be executed simultaneously?
I don't know. I don't think anyone has really felt the need for that.
/ Mirar
A multiprocessor machine doesn't necessarily create the need to make one process use all the processors, I'd say. Merely opens the possibility...
Object-global locks instead of process-global interpreter locks was one idea, I think.
/ Mirar
Previous text:
2004-02-01 10:37: Subject: Re: Default backend and thread backends?
Well, the need has been there for a long time (i.e. since we started using multiprocessor machines). Since my knowledge of the internals is close to zero I can't predict the steps needed to achieve it, though. Per-thread string pools? Explicit disabling of thread switching in Pike code that relies on the interpreter lock? Incremental gc?
/ Jonas Walldén
Well, then another question - is it planned to make Pike really multithreaded, so Pike code will be executed simultaneously?
I don't know. I don't think anyone has really felt the need for that.
I've tried doing that. But it turns out that Pike would become less than half as fast in the process, so the effort was pretty much self- defeating.
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2004-02-01 09:13: Subject: Re: Default backend and thread backends?
So, basically, everything that is external (C modules) is not affected?
Yes. While Pike is executing Pike code, it locks the Big Main Pike Interpreter Lock. This is process-global. When it stops doing that the lock is released, and heavy C functions do release it. (Not all, since it takes some computrons to release and lock.)
Well, then another question - is it planned to make Pike really multithreaded, so Pike code will be executed simultaneously?
I don't know. I don't think anyone has really felt the need for that.
/ Mirar
Might make sense on a 4+ CPU machine?
/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)
Previous text:
2004-02-01 11:21: Subject: Re: Default backend and thread backends?
Well, then another question - is it planned to make Pike really multithreaded, so Pike code will be executed simultaneously?
I don't know. I don't think anyone has really felt the need for that.
I've tried doing that. But it turns out that Pike would become less than half as fast in the process, so the effort was pretty much self- defeating.
/ Fredrik (Naranek) Hubinette (Real Build Master)
Yeah, but those machines are uncommon enough that the amount of work required doesn't make much sense. Also, even if you *do* have a 4-cpu machine, you would also need a pike program which is using 4+ threads and spending a lot of time in the interpreter...
I firmly beleive that it makes much more sense to make Pike "pure" (ie. no global variables) and then instansiate more than one interpreter in the same process, each with it's own interpreter lock and thread(s).
The only question is weather a "pure" Pike is worth the effort, as most modern operating systems can create a new process fast enough that running a separate process for each interpreter is just as easy.
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2004-02-01 12:10: Subject: Re: Default backend and thread backends?
Might make sense on a 4+ CPU machine?
/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)
I firmly beleive that it makes much more sense to make Pike "pure" (ie. no global variables) and then instansiate more than one interpreter in the same process, each with it's own interpreter lock and thread(s).
If one does that, one could create some Pike abstraction for passing data between the interpreters, which could potentially be more efficient than ordinary interprocess communication.
A "pure" pike is also good for anyone who wants to embed pike into other programs.
/ Niels Möller (vässar rödpennan)
Previous text:
2004-02-01 20:56: Subject: Re: Default backend and thread backends?
Yeah, but those machines are uncommon enough that the amount of work required doesn't make much sense. Also, even if you *do* have a 4-cpu machine, you would also need a pike program which is using 4+ threads and spending a lot of time in the interpreter...
I firmly beleive that it makes much more sense to make Pike "pure" (ie. no global variables) and then instansiate more than one interpreter in the same process, each with it's own interpreter lock and thread(s).
The only question is weather a "pure" Pike is worth the effort, as most modern operating systems can create a new process fast enough that running a separate process for each interpreter is just as easy.
/ Fredrik (Naranek) Hubinette (Real Build Master)
I firmly beleive that it makes much more sense to make Pike "pure" (ie. no global variables) and then instansiate more than one interpreter in the same process, each with it's own interpreter lock and thread(s).
If one does that, one could create some Pike abstraction for passing data between the interpreters, which could potentially be more efficient than ordinary interprocess communication.
Indeed, that would be very nice. In fact, the abstraction layer should probably be created first, as it would also be very very useful for passing data between Pike processes. An optimized version could then be created later for in-process communication.
A "pure" pike is also good for anyone who wants to embed pike into other programs.
Absolutely, it would make mod_pike really really easy...
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2004-02-01 21:08: Subject: Re: Default backend and thread backends?
I firmly beleive that it makes much more sense to make Pike "pure" (ie. no global variables) and then instansiate more than one interpreter in the same process, each with it's own interpreter lock and thread(s).
If one does that, one could create some Pike abstraction for passing data between the interpreters, which could potentially be more efficient than ordinary interprocess communication.
A "pure" pike is also good for anyone who wants to embed pike into other programs.
/ Niels Möller (vässar rödpennan)
On Sun, Feb 01, 2004 at 11:25:01AM +0100, Fredrik (Naranek) Hubinette (Real Build Master) @ Pike (-) developers forum wrote:
I've tried doing that. But it turns out that Pike would become less than half as fast in the process, so the effort was pretty much self- defeating.
Why not just create separate interpreter object (state) for every thread, and leave thread management (mutexes, IPCs, etc) to an app?
Another alternative would be object locks, function locks, etc (an attribute like "locked"?).
If I've SMP machine (and this becomes common nowadays, at least for servers), it makes sense.
But I guess it would require very extensive rewriting of interpreter, or?
Regards, /Al
You can get very good load spread over multiple CPUs by running one pike process for every CPU you want to utilize. It's actually one of the better ways of getting pike to put as much CPU to use as possible.
/ Johan Sundström (Achtung Liebe!)
Previous text:
2004-02-01 13:12: Subject: Re: Default backend and thread backends?
On Sun, Feb 01, 2004 at 11:25:01AM +0100, Fredrik (Naranek) Hubinette (Real Build Master) @ Pike (-) developers forum wrote:
I've tried doing that. But it turns out that Pike would become less than half as fast in the process, so the effort was pretty much self- defeating.
Why not just create separate interpreter object (state) for every thread, and leave thread management (mutexes, IPCs, etc) to an app?
Another alternative would be object locks, function locks, etc (an attribute like "locked"?).
If I've SMP machine (and this becomes common nowadays, at least for servers), it makes sense.
But I guess it would require very extensive rewriting of interpreter, or?
Regards, /Al
/ Brevbäraren
On Sun, Feb 01, 2004 at 04:25:01PM +0100, Johan Sundstrцm (Achtung Liebe!) @ Pike (-) developers forum wrote:
You can get very good load spread over multiple CPUs by running one pike process for every CPU you want to utilize.
Then I've to deal with IPC, which is not as fast as direct access to the data, even if protected by mutexes.
But I seldom use even mutexes, since almost every object (or another piece of data) is passed to processing threads for exclusive use.
There is a lot of data exchanges in my app, and some of them really big, even huge, that's one of reasons why I want multi-threading instead of separate processes.
My tests show that IPC, whatever it is, except of shared memory, is quite slow, but I can't map Pike objects and data structures to shared memory, unfortunately.
Sure, I can just use more powerful hardware for this specific task, but that would be only a workaround, not a solution :)
Regards, /Al
I've tried doing that. But it turns out that Pike would become less than half as fast in the process, so the effort was pretty much self- defeating.
Why not just create separate interpreter object (state) for every thread, and leave thread management (mutexes, IPCs, etc) to an app?
That is an option, but you would have all the same issues when sharing data as you would when running each interpreter in a separate process as.
Another alternative would be object locks, function locks, etc (an attribute like "locked"?).
I don't think you understand the issue. Pike has a multitude of internal data structures which have to be protected. This means that hundreds or thousands of locks has to be created and locked while Pike is executing.
If I've SMP machine (and this becomes common nowadays, at least for servers), it makes sense.
But I guess it would require very extensive rewriting of interpreter, or?
Yes. (Btw, python threads have exactly the same limitations...)
/Hubbe
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2004-02-01 13:12: Subject: Re: Default backend and thread backends?
On Sun, Feb 01, 2004 at 11:25:01AM +0100, Fredrik (Naranek) Hubinette (Real Build Master) @ Pike (-) developers forum wrote:
I've tried doing that. But it turns out that Pike would become less than half as fast in the process, so the effort was pretty much self- defeating.
Why not just create separate interpreter object (state) for every thread, and leave thread management (mutexes, IPCs, etc) to an app?
Another alternative would be object locks, function locks, etc (an attribute like "locked"?).
If I've SMP machine (and this becomes common nowadays, at least for servers), it makes sense.
But I guess it would require very extensive rewriting of interpreter, or?
Regards, /Al
/ Brevbäraren
On Sun, Feb 01, 2004 at 09:05:03PM +0100, Fredrik (Naranek) Hubinette (Real Build Master) @ Pike (-) developers forum wrote:
That is an option, but you would have all the same issues when sharing data as you would when running each interpreter in a separate process as.
Not really. With threads I've same memory space, with processes - there is no native support of shared memory in Pike (yet), so IPC is slow.
I don't think you understand the issue. Pike has a multitude of internal data structures which have to be protected.
Hmm... This means that I can't create two or more independent Pike interpreters in single process, even without taking into account threads? Everything is global?
Regards, /Al
On Sun, Feb 01, 2004 at 09:05:03PM +0100, Fredrik (Naranek) Hubinette (Real Build Master) @ Pike (-) developers forum wrote:
That is an option, but you would have all the same issues when sharing data as you would when running each interpreter in a separate process as.
Not really. With threads I've same memory space, with processes - there is no native support of shared memory in Pike (yet), so IPC is slow.
I don't think you understand the issue. Pike has a multitude of internal data structures which have to be protected.
Hmm... This means that I can't create two or more independent Pike interpreters in single process, even without taking into account threads? Everything is global?
Again, I don't think you understand, and I think I would need a whiteboard to explain.
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2004-02-01 21:29: Subject: Re: Default backend and thread backends?
On Sun, Feb 01, 2004 at 09:05:03PM +0100, Fredrik (Naranek) Hubinette (Real Build Master) @ Pike (-) developers forum wrote:
That is an option, but you would have all the same issues when sharing data as you would when running each interpreter in a separate process as.
Not really. With threads I've same memory space, with processes - there is no native support of shared memory in Pike (yet), so IPC is slow.
I don't think you understand the issue. Pike has a multitude of internal data structures which have to be protected.
Hmm... This means that I can't create two or more independent Pike interpreters in single process, even without taking into account threads? Everything is global?
Regards, /Al
/ Brevbäraren
But that was tested by adding locks on existing global data structures, not removing them, right? I won't believe for a second that it's a theoretical limit (i.e. based on Pike language properties) which can't be overcome.
/ Jonas Walldén
Previous text:
2004-02-01 11:21: Subject: Re: Default backend and thread backends?
Well, then another question - is it planned to make Pike really multithreaded, so Pike code will be executed simultaneously?
I don't know. I don't think anyone has really felt the need for that.
I've tried doing that. But it turns out that Pike would become less than half as fast in the process, so the effort was pretty much self- defeating.
/ Fredrik (Naranek) Hubinette (Real Build Master)
It is a limit for the language as it is now.
We have to add explicit locking in the language, or automatic locking in the interpreter.
The latter is what has been tried.
/ Per Hedbor ()
Previous text:
2004-02-01 14:47: Subject: Re: Default backend and thread backends?
But that was tested by adding locks on existing global data structures, not removing them, right? I won't believe for a second that it's a theoretical limit (i.e. based on Pike language properties) which can't be overcome.
/ Jonas Walldén
If you're referring to the interpreter lock which protects against thread switching under particular conditions I don't know if that's considered part of the language specs (are the conditions documented at all?). Still, a quick grep in the Roxen source code found less than three dozen commented occurrences out of ~250KLOC so it's hopefully not too difficult to mark those parts with #pragma directives or something similar. A nice start would be to devise a method today so that code written from now on is safe.
Anyway, I'm just hoping that multiprocessing wasn't discarded years ago never to be reconsidered again, even if it means we'll have to sacrifice some nice properties of the current implementation (like O(1) string comparisons due to a global string table etc).
/ Jonas Walldén
Previous text:
2004-02-01 15:14: Subject: Re: Default backend and thread backends?
It is a limit for the language as it is now.
We have to add explicit locking in the language, or automatic locking in the interpreter.
The latter is what has been tried.
/ Per Hedbor ()
You do not understand the issue. Just removing the interpreter lock would make Pike crash almost immediately when you start a second thread. You need to replace the interpreter lock with more fine-grained locks that lock only the data that each thread is currently working on. Those extra lock/unlock operations is what slows down Pike.
The only other option is to use some sort of data isolation, so that a thread doesn't have to lock it's data, because no other thread can access it.
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2004-02-01 15:48: Subject: Re: Default backend and thread backends?
If you're referring to the interpreter lock which protects against thread switching under particular conditions I don't know if that's considered part of the language specs (are the conditions documented at all?). Still, a quick grep in the Roxen source code found less than three dozen commented occurrences out of ~250KLOC so it's hopefully not too difficult to mark those parts with #pragma directives or something similar. A nice start would be to devise a method today so that code written from now on is safe.
Anyway, I'm just hoping that multiprocessing wasn't discarded years ago never to be reconsidered again, even if it means we'll have to sacrifice some nice properties of the current implementation (like O(1) string comparisons due to a global string table etc).
/ Jonas Walldén
Yes, it's the second option I've been talking about. Maybe it's an all or nothing proposition which would require a total rewrite (which won't happen), but if there is a way of getting there step by step it would be great.
/ Jonas Walldén
Previous text:
2004-02-01 21:05: Subject: Re: Default backend and thread backends?
You do not understand the issue. Just removing the interpreter lock would make Pike crash almost immediately when you start a second thread. You need to replace the interpreter lock with more fine-grained locks that lock only the data that each thread is currently working on. Those extra lock/unlock operations is what slows down Pike.
The only other option is to use some sort of data isolation, so that a thread doesn't have to lock it's data, because no other thread can access it.
/ Fredrik (Naranek) Hubinette (Real Build Master)
Que?
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2004-02-01 14:47: Subject: Re: Default backend and thread backends?
But that was tested by adding locks on existing global data structures, not removing them, right? I won't believe for a second that it's a theoretical limit (i.e. based on Pike language properties) which can't be overcome.
/ Jonas Walldén
Sorry, maybe not totally clear. What I meant was that you added locks to existing global data instead of redesigning things to avoid global storage to begin with (which is what you call the second option in 11295426).
/ Jonas Walldén
Previous text:
2004-02-01 21:01: Subject: Re: Default backend and thread backends?
Que?
/ Fredrik (Naranek) Hubinette (Real Build Master)
The problem with "the second option" is that it represents a radically different threading model than what Pike is currently using. That means that normal threaded programs using thread_create() would still not be multi-cpu enabled.
Essentially, we would have to design a new API for creating 'threads' and inter-thread communication. If we write this API right, it could be used to communicate between:
o threads running in the same interpreter o threads running in separate interpreters o threads running in separate processes o threads running on separate machines
Personally, I don't need the headache of designing this API :)
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2004-02-01 21:47: Subject: Re: Default backend and thread backends?
Sorry, maybe not totally clear. What I meant was that you added locks to existing global data instead of redesigning things to avoid global storage to begin with (which is what you call the second option in 11295426).
/ Jonas Walldén
As for threads in the same process, it should in principle be possible to do it without hardly any visible changes in Pike:
1. Let each memory object have a flag to indicate whether it's thread local or global. 2. Thread local things are linked together in thread local link lists. 3. Global objects are protected by a read/write lock. 4. Whenever a thread has to change a global thing it acquires the write lock which would essentially be like the current interpreter lock. 5. When a thread has the write lock it holds it for some time, just like the interpreter lock, so that global changes can be done without lots of locking. 6. When a reference is added from a global to thread local thing, the latter becomes global. This is transitive - at this point it'd be necessary to follow all references in the thread local thing and mark all that as global too. 7. In some cases it'd be necessary to convert global things to thread local (e.g. in a thread queue implementation that's used to dispatch work from a global queue to handler threads). That can be done implicitly if there's only one ref to the thing, but one might want an explicit function to do it. The problem with such a function would be that it has to go through all global things to ensure that there's no global ref anywhere.
The main issues with this are three afaics:
o It's internally quite a big change since there will be many more linked lists, and the operation required in item 6 makes it necessary to fix almost every place add_ref() is used. o The current string implementation makes it impossible to handle thread local strings, and requiring the write lock whenever a new string is created would probably defeat a lot of the parallellism. Thus it's necessary to change the thread implementation to fall back to strncmp when a global and a local string is compared. I guess that this was what Jonas meant with the problematic O(1) property in strings. o Adding and subtracting refs to global things must be done without taking the write lock. On most architectures it ought to be possible to use atomic increment and decrement operations for that.
Neither of these are impossible to overcome, but it would be quite a big change and it would be an incompatible API change for C modules.
At least if I were to make a new language I'd definitely implement a scheme like the above since it automatically makes the threads as separate as possible and I think the transitions of data between the global and local spaces would be fairly few in a reasonably well designed application. It'd be nice to have some debug tools, e.g. to be able to declare an object as thread local which cause an error to be thrown if it becomes referenced from the global data set.
/ Martin Stjernholm, Roxen IS
Previous text:
2004-02-02 06:35: Subject: Re: Default backend and thread backends?
The problem with "the second option" is that it represents a radically different threading model than what Pike is currently using. That means that normal threaded programs using thread_create() would still not be multi-cpu enabled.
Essentially, we would have to design a new API for creating 'threads' and inter-thread communication. If we write this API right, it could be used to communicate between:
o threads running in the same interpreter o threads running in separate interpreters o threads running in separate processes o threads running on separate machines
Personally, I don't need the headache of designing this API :)
/ Fredrik (Naranek) Hubinette (Real Build Master)
- Let each memory object have a flag to indicate whether it's thread local or global.
- When a reference is added from a global to thread local thing, the latter becomes global. This is transitive - at this point it'd be necessary to follow all references in the thread local thing and mark all that as global too.
Hmm. This is tricky. At first I thought the above was open to the following race, relating to the use of an object X which is initially local to thread A:
| Thread A Thread B | | Check global/local | flag of object X. | | Make X global. | | Get all locks needed for using X. | | Use object X. Use object X. | V time
But we can save the day by the quite natural requirement that it's only thread A that can turn its local objects into global objects.
/ Niels Möller (vässar rödpennan)
Previous text:
2004-02-02 21:18: Subject: Re: Default backend and thread backends?
As for threads in the same process, it should in principle be possible to do it without hardly any visible changes in Pike:
- Let each memory object have a flag to indicate whether it's thread local or global.
- Thread local things are linked together in thread local link lists.
- Global objects are protected by a read/write lock.
- Whenever a thread has to change a global thing it acquires the write lock which would essentially be like the current interpreter lock.
- When a thread has the write lock it holds it for some time, just like the interpreter lock, so that global changes can be done without lots of locking.
- When a reference is added from a global to thread local thing, the latter becomes global. This is transitive - at this point it'd be necessary to follow all references in the thread local thing and mark all that as global too.
- In some cases it'd be necessary to convert global things to thread local (e.g. in a thread queue implementation that's used to dispatch work from a global queue to handler threads). That can be done implicitly if there's only one ref to the thing, but one might want an explicit function to do it. The problem with such a function would be that it has to go through all global things to ensure that there's no global ref anywhere.
The main issues with this are three afaics:
o It's internally quite a big change since there will be many more linked lists, and the operation required in item 6 makes it necessary to fix almost every place add_ref() is used. o The current string implementation makes it impossible to handle thread local strings, and requiring the write lock whenever a new string is created would probably defeat a lot of the parallellism. Thus it's necessary to change the thread implementation to fall back to strncmp when a global and a local string is compared. I guess that this was what Jonas meant with the problematic O(1) property in strings. o Adding and subtracting refs to global things must be done without taking the write lock. On most architectures it ought to be possible to use atomic increment and decrement operations for that.
Neither of these are impossible to overcome, but it would be quite a big change and it would be an incompatible API change for C modules.
At least if I were to make a new language I'd definitely implement a scheme like the above since it automatically makes the threads as separate as possible and I think the transitions of data between the global and local spaces would be fairly few in a reasonably well designed application. It'd be nice to have some debug tools, e.g. to be able to declare an object as thread local which cause an error to be thrown if it becomes referenced from the global data set.
/ Martin Stjernholm, Roxen IS
It's even safer than that since B has to hold the global write lock while it does that operation, so A doesn't do anything during that time (since it certainly wouldn't release its read lock in the time gap you show).
/ Martin Stjernholm, Roxen IS
Previous text:
2004-02-02 21:34: Subject: Re: Default backend and thread backends?
- Let each memory object have a flag to indicate whether it's thread local or global.
- When a reference is added from a global to thread local thing, the latter becomes global. This is transitive - at this point it'd be necessary to follow all references in the thread local thing and mark all that as global too.
Hmm. This is tricky. At first I thought the above was open to the following race, relating to the use of an object X which is initially local to thread A:
| Thread A Thread B | | Check global/local | flag of object X. | | Make X global. | | Get all locks needed for using X. | | Use object X. Use object X. | V time
But we can save the day by the quite natural requirement that it's only thread A that can turn its local objects into global objects.
/ Niels Möller (vässar rödpennan)
My understanding was that as long as one access local objects only, one shouldn't need to hold *any* locks. But perhaps that's too good to be possible.
The problem is following pointers to global objects. When we read a pointer from a local object, we can first examine the local/global flag of the object it points to. If it's local, it's our own object and it can't change under our feet, so we can just go on. If it's global, we need to get the global lock, and then we can access the rest object.
Races we must consider is if the local flag is changing at the same time. It can't change from local to global, since in that case it is our object, so only we could change the flag and we're not doing it.
For changes from global to local, I think we're safe if we have the only reference to the object. I.e. the procedure could be: Get the global lock. Check that there is exactly one reference. Set the flag to local.
/ Niels Möller (vässar rödpennan)
Previous text:
2004-02-02 22:25: Subject: Re: Default backend and thread backends?
It's even safer than that since B has to hold the global write lock while it does that operation, so A doesn't do anything during that time (since it certainly wouldn't release its read lock in the time gap you show).
/ Martin Stjernholm, Roxen IS
It's a good plan, and I think it has a good chance of working. I would combine it with an addition go gc() that returns the thread-local flag on things which are no longer global automatically.
However, I'm still not entirely sure that this will be fast enough to be worth it. There will still be a lot of locking operations, and there will be a *lot* of places where the local/global flag would have to be checked.
In fact, I'm not entirely sure that it's mutch faster than having one mutex per object, because locking an unlocked mutex takes very little time. (Similar to checking a flag.)
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2004-02-02 21:18: Subject: Re: Default backend and thread backends?
As for threads in the same process, it should in principle be possible to do it without hardly any visible changes in Pike:
- Let each memory object have a flag to indicate whether it's thread local or global.
- Thread local things are linked together in thread local link lists.
- Global objects are protected by a read/write lock.
- Whenever a thread has to change a global thing it acquires the write lock which would essentially be like the current interpreter lock.
- When a thread has the write lock it holds it for some time, just like the interpreter lock, so that global changes can be done without lots of locking.
- When a reference is added from a global to thread local thing, the latter becomes global. This is transitive - at this point it'd be necessary to follow all references in the thread local thing and mark all that as global too.
- In some cases it'd be necessary to convert global things to thread local (e.g. in a thread queue implementation that's used to dispatch work from a global queue to handler threads). That can be done implicitly if there's only one ref to the thing, but one might want an explicit function to do it. The problem with such a function would be that it has to go through all global things to ensure that there's no global ref anywhere.
The main issues with this are three afaics:
o It's internally quite a big change since there will be many more linked lists, and the operation required in item 6 makes it necessary to fix almost every place add_ref() is used. o The current string implementation makes it impossible to handle thread local strings, and requiring the write lock whenever a new string is created would probably defeat a lot of the parallellism. Thus it's necessary to change the thread implementation to fall back to strncmp when a global and a local string is compared. I guess that this was what Jonas meant with the problematic O(1) property in strings. o Adding and subtracting refs to global things must be done without taking the write lock. On most architectures it ought to be possible to use atomic increment and decrement operations for that.
Neither of these are impossible to overcome, but it would be quite a big change and it would be an incompatible API change for C modules.
At least if I were to make a new language I'd definitely implement a scheme like the above since it automatically makes the threads as separate as possible and I think the transitions of data between the global and local spaces would be fairly few in a reasonably well designed application. It'd be nice to have some debug tools, e.g. to be able to declare an object as thread local which cause an error to be thrown if it becomes referenced from the global data set.
/ Martin Stjernholm, Roxen IS
It may be more important to have thread-local pools for memory allocation. Well, it's hard to guess without some measurements on which data and resources the threads are actively competing for.
/ Niels Möller (vässar rödpennan)
Previous text:
2004-02-03 07:22: Subject: Re: Default backend and thread backends?
It's a good plan, and I think it has a good chance of working. I would combine it with an addition go gc() that returns the thread-local flag on things which are no longer global automatically.
However, I'm still not entirely sure that this will be fast enough to be worth it. There will still be a lot of locking operations, and there will be a *lot* of places where the local/global flag would have to be checked.
In fact, I'm not entirely sure that it's mutch faster than having one mutex per object, because locking an unlocked mutex takes very little time. (Similar to checking a flag.)
/ Fredrik (Naranek) Hubinette (Real Build Master)
On Tue, Feb 03, 2004 at 07:25:04AM +0100, Fredrik (Naranek) Hubinette (Real Build Master) @ Pike (-) developers forum wrote:
However, I'm still not entirely sure that this will be fast enough to be worth it. There will still be a lot of locking operations,
Most apps are single-threaded, so mutex ops will be no-ops effectively, but for those apps where multi-threading is important, the gain could overweight the loss, I guess.
Really, mutex op is quite fast (I guess max 3 instructions on most CPUs), so - is it really _so_ bad?
Regards, /Al
Really, mutex op is quite fast (I guess max 3 instructions on most CPUs), so - is it really _so_ bad?
Is that true for multi-cpu machines also?
A pthreads mutex operation does not only affect the mutex structure itself, but also issue any needed instruction to get the caches of the involved cpu:s into a consistent state. How much work that is is pretty architecture dependent.
/ Niels Möller (vässar rödpennan)
Previous text:
2004-02-03 11:12: Subject: Re: Default backend and thread backends?
On Tue, Feb 03, 2004 at 07:25:04AM +0100, Fredrik (Naranek) Hubinette (Real Build Master) @ Pike (-) developers forum wrote:
However, I'm still not entirely sure that this will be fast enough to be worth it. There will still be a lot of locking operations,
Most apps are single-threaded, so mutex ops will be no-ops effectively, but for those apps where multi-threading is important, the gain could overweight the loss, I guess.
Really, mutex op is quite fast (I guess max 3 instructions on most CPUs), so - is it really _so_ bad?
Regards, /Al
/ Brevbäraren
On Tue, Feb 03, 2004 at 11:25:03AM +0100, Niels Möller (vässar rödpennan) @ Pike (-) developers forum wrote:
Really, mutex op is quite fast (I guess max 3 instructions on most CPUs), so - is it really _so_ bad?
Is that true for multi-cpu machines also?
At least on Intel platform - yes, just adds one more instruction (lock).
I guess on every platform where SMP is supported it should be fast enough, otherwise it would make SMP pretty useless.
A pthreads mutex operation does not only affect the mutex structure itself, but also issue any needed instruction to get the caches of the
This is left to the CPU. At least looking into the code doesn't reveal anything but two-three instructions (including similar ops in kernel code).
Regards, /Al
However, the lock instruction can take forever to execute (the actual delay depends on the system architecture)
So it's not really simply one more instruction.
/ Per Hedbor ()
Previous text:
2004-02-03 11:42: Subject: Re: Default backend and thread backends?
On Tue, Feb 03, 2004 at 11:25:03AM +0100, Niels Möller (vässar rödpennan) @ Pike (-) developers forum wrote:
Really, mutex op is quite fast (I guess max 3 instructions on most CPUs), so - is it really _so_ bad?
Is that true for multi-cpu machines also?
At least on Intel platform - yes, just adds one more instruction (lock).
I guess on every platform where SMP is supported it should be fast enough, otherwise it would make SMP pretty useless.
A pthreads mutex operation does not only affect the mutex structure itself, but also issue any needed instruction to get the caches of the
This is left to the CPU. At least looking into the code doesn't reveal anything but two-three instructions (including similar ops in kernel code).
Regards, /Al
/ Brevbäraren
On Tue, Feb 03, 2004 at 12:05:02PM +0100, Per Hedbor () @ Pike (-) developers forum wrote:
However, the lock instruction can take forever to execute (the actual delay depends on the system architecture)
No, it won't. From the manual (Intel):
"Causes the processor's LOCK# signal to be asserted during execution of the accompanying instruction (turns the instruction into an atomic instruction). In a multiprocessor environment, the LOCK# signal insures that the processor has exclusive use of any shared memory while the signal is asserted."
That's all. Implementation on other architectures is similar, AFAIK.
Regards, /Al
How did that excerpt promise anything about low execution times?
/ Johan Sundström (Achtung Liebe!)
Previous text:
2004-02-03 12:23: Subject: Re: Default backend and thread backends?
On Tue, Feb 03, 2004 at 12:05:02PM +0100, Per Hedbor () @ Pike (-) developers forum wrote:
However, the lock instruction can take forever to execute (the actual delay depends on the system architecture)
No, it won't. From the manual (Intel):
"Causes the processor's LOCK# signal to be asserted during execution of the accompanying instruction (turns the instruction into an atomic instruction). In a multiprocessor environment, the LOCK# signal insures that the processor has exclusive use of any shared memory while the signal is asserted."
That's all. Implementation on other architectures is similar, AFAIK.
Regards, /Al
/ Brevbäraren
But of course. Consider, however, a eight CPU system, where the other four CPU:s also have LOCK# high. You then have to wait for at least 7 memory latency delays, which is almost forever with modern CPU:s.
Also, the memory access that is accompanied by LOCK tends to involve physical RAM, not cache (since reading the data from the processor local cache would defeat the purpose), which on a P4 takes _at least_ 200 cycles.
Now, consider having a LOCK for each reference count change in pike (this is basically what hubbe implemented). You get severe slowdowns.
/ Per Hedbor ()
Previous text:
2004-02-03 12:23: Subject: Re: Default backend and thread backends?
On Tue, Feb 03, 2004 at 12:05:02PM +0100, Per Hedbor () @ Pike (-) developers forum wrote:
However, the lock instruction can take forever to execute (the actual delay depends on the system architecture)
No, it won't. From the manual (Intel):
"Causes the processor's LOCK# signal to be asserted during execution of the accompanying instruction (turns the instruction into an atomic instruction). In a multiprocessor environment, the LOCK# signal insures that the processor has exclusive use of any shared memory while the signal is asserted."
That's all. Implementation on other architectures is similar, AFAIK.
Regards, /Al
/ Brevbäraren
Eh, 'other seven CPU:s'.
/ Per Hedbor ()
Previous text:
2004-02-03 12:33: Subject: Re: Default backend and thread backends?
But of course. Consider, however, a eight CPU system, where the other four CPU:s also have LOCK# high. You then have to wait for at least 7 memory latency delays, which is almost forever with modern CPU:s.
Also, the memory access that is accompanied by LOCK tends to involve physical RAM, not cache (since reading the data from the processor local cache would defeat the purpose), which on a P4 takes _at least_ 200 cycles.
Now, consider having a LOCK for each reference count change in pike (this is basically what hubbe implemented). You get severe slowdowns.
/ Per Hedbor ()
On Tue, Feb 03, 2004 at 12:35:04PM +0100, Per Hedbor () @ Pike (-) developers forum wrote:
Now, consider having a LOCK for each reference count change in pike (this is basically what hubbe implemented). You get severe slowdowns.
Well... Then it makes no sense, of course.
And I am going to think that languages like Pike (i.e. any where reference counting is used so heavily) are not well suited for multithreading...
OTOH, in case if interpreter state will be thread-local, and only few objects may be used for IPC/ITC, it still makes sense, or?
Personally, I would prefer to have complete control over object locking, instead of delegating this task to "too smart" interpreter engine, which tries to guess what I need/want and to save me from my own mistakes :)
Regards, /Al
Personally, I would prefer to have complete control over object locking, instead of delegating this task to "too smart" interpreter
The you really need another language than pike. Such as C.
Java (at least when using suns JRE) does not count, it cannot use more than one CPU:s in practical tests I found using google. I guess it does automatic locking when allocating or deallocating objects (which happens all the time (r)) and thus does not really scale.
Most threaded C-applications also tend not to scale to multiple CPU:s, actually.
/ Per Hedbor ()
Previous text:
2004-02-03 12:52: Subject: Re: Default backend and thread backends?
On Tue, Feb 03, 2004 at 12:35:04PM +0100, Per Hedbor () @ Pike (-) developers forum wrote:
Now, consider having a LOCK for each reference count change in pike (this is basically what hubbe implemented). You get severe slowdowns.
Well... Then it makes no sense, of course.
And I am going to think that languages like Pike (i.e. any where reference counting is used so heavily) are not well suited for multithreading...
OTOH, in case if interpreter state will be thread-local, and only few objects may be used for IPC/ITC, it still makes sense, or?
Personally, I would prefer to have complete control over object locking, instead of delegating this task to "too smart" interpreter engine, which tries to guess what I need/want and to save me from my own mistakes :)
Regards, /Al
/ Brevbäraren
However, java does seem to scale linearly if you do purely CPU-bound things (such for( int i = 0; i<2^31; i++ );)
/ Per Hedbor ()
Previous text:
2004-02-03 13:38: Subject: Re: Default backend and thread backends?
Personally, I would prefer to have complete control over object locking, instead of delegating this task to "too smart" interpreter
The you really need another language than pike. Such as C.
Java (at least when using suns JRE) does not count, it cannot use more than one CPU:s in practical tests I found using google. I guess it does automatic locking when allocating or deallocating objects (which happens all the time (r)) and thus does not really scale.
Most threaded C-applications also tend not to scale to multiple CPU:s, actually.
/ Per Hedbor ()
And I am going to think that languages like Pike (i.e. any where reference counting is used so heavily) are not well suited for multithreading...
In general, efficient garbage collection on heavily threaded systems is difficult (although not impossible). The reference counts can be viewed as an implementation detail of the garbage collector (although they also have more visible side effects, such as deterministic destruction of object that go out of scope).
/ Niels Möller (vässar rödpennan)
Previous text:
2004-02-03 12:52: Subject: Re: Default backend and thread backends?
On Tue, Feb 03, 2004 at 12:35:04PM +0100, Per Hedbor () @ Pike (-) developers forum wrote:
Now, consider having a LOCK for each reference count change in pike (this is basically what hubbe implemented). You get severe slowdowns.
Well... Then it makes no sense, of course.
And I am going to think that languages like Pike (i.e. any where reference counting is used so heavily) are not well suited for multithreading...
OTOH, in case if interpreter state will be thread-local, and only few objects may be used for IPC/ITC, it still makes sense, or?
Personally, I would prefer to have complete control over object locking, instead of delegating this task to "too smart" interpreter engine, which tries to guess what I need/want and to save me from my own mistakes :)
Regards, /Al
/ Brevbäraren
It is so bad, since it would be done for _each and every_ access to data.
Thus basically doubling the time.
/ Per Hedbor ()
Previous text:
2004-02-03 11:12: Subject: Re: Default backend and thread backends?
On Tue, Feb 03, 2004 at 07:25:04AM +0100, Fredrik (Naranek) Hubinette (Real Build Master) @ Pike (-) developers forum wrote:
However, I'm still not entirely sure that this will be fast enough to be worth it. There will still be a lot of locking operations,
Most apps are single-threaded, so mutex ops will be no-ops effectively, but for those apps where multi-threading is important, the gain could overweight the loss, I guess.
Really, mutex op is quite fast (I guess max 3 instructions on most CPUs), so - is it really _so_ bad?
Regards, /Al
/ Brevbäraren
On Tue, Feb 03, 2004 at 11:25:03AM +0100, Per Hedbor () @ Pike (-) developers forum wrote:
It is so bad, since it would be done for _each and every_ access to data.
Why? Unless I know that some object (or another piece of data) may be accessed from another thread?
After all, languages like Java/C# works somehow (and not so bad), so it is difficult to understand why it is so bad in case of Pike...
Well, I am not talking about _current_ implementation, but in general, if it is possible in other interpreted/byte-code languages, I see no reason why it shouldn't be possible in Pike...
Regards, /Al
It would be possible if we change the language to require explicit locking, and skip things like reference counting.
/ Per Hedbor ()
Previous text:
2004-02-03 11:39: Subject: Re: Default backend and thread backends?
On Tue, Feb 03, 2004 at 11:25:03AM +0100, Per Hedbor () @ Pike (-) developers forum wrote:
It is so bad, since it would be done for _each and every_ access to data.
Why? Unless I know that some object (or another piece of data) may be accessed from another thread?
After all, languages like Java/C# works somehow (and not so bad), so it is difficult to understand why it is so bad in case of Pike...
Well, I am not talking about _current_ implementation, but in general, if it is possible in other interpreted/byte-code languages, I see no reason why it shouldn't be possible in Pike...
Regards, /Al
/ Brevbäraren
On Tue, Feb 03, 2004 at 07:25:04AM +0100, Fredrik (Naranek) Hubinette (Real Build Master) @ Pike (-) developers forum wrote:
However, I'm still not entirely sure that this will be fast enough to be worth it. There will still be a lot of locking operations,
Most apps are single-threaded, so mutex ops will be no-ops effectively, but for those apps where multi-threading is important, the gain could overweight the loss, I guess.
Really, mutex op is quite fast (I guess max 3 instructions on most CPUs), so - is it really _so_ bad?
I think so.
Like I said before, I was experimenting with this a while back, and I added mutexes to a bunch of places in pike which had to be locked, and I used atomic operations to protect all add_ref() and free* operations.
At the time I less than half-way done, Pike was already 30% slower, and that was while running a single-threaded application, so it didn't actually have to wait for those mutexes, it just had to lock them. That's when I started realizing that Pike was going to become pretty slow when all my work was done, so I gave up and started thinking about alternative ways.
3 instructions might not sound very bad, but you have to consider that on a dual-cpu system, those instructions have to go out on the cpu bus and do some cache-coherency stuff to make sure that the other cpu isn't trying to do the same thing at the same time.
Also, you also need to realize that pike executes tens of millions of add_ref/free_* and other operations which would need to be locked per second.
However, there is hope. I'm not suggesting that this is impossible, I'm just saying that I have tried and failed, and hopefully something can be learned from that experience.
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2004-02-03 11:12: Subject: Re: Default backend and thread backends?
On Tue, Feb 03, 2004 at 07:25:04AM +0100, Fredrik (Naranek) Hubinette (Real Build Master) @ Pike (-) developers forum wrote:
However, I'm still not entirely sure that this will be fast enough to be worth it. There will still be a lot of locking operations,
Most apps are single-threaded, so mutex ops will be no-ops effectively, but for those apps where multi-threading is important, the gain could overweight the loss, I guess.
Really, mutex op is quite fast (I guess max 3 instructions on most CPUs), so - is it really _so_ bad?
Regards, /Al
/ Brevbäraren
There will still be a lot of locking operations,
The intention is that the locking strategy would keep that on the same level as the current interpreter lock. Have I missed something?
and there will be a *lot* of places where the local/global flag would have to be checked.
Everywhere where a memory object is changed or freed, but not where it only is read. Trivial per-object locking would have to lock for both operations, which I think is a big difference. It's hard to estimate the read-to-write ratio (not counting the stack, which is thread local and therefore unlocked in any case), but I chance it's around 10:1 at least.
In fact, I'm not entirely sure that it's mutch faster than having one mutex per object, because locking an unlocked mutex takes very little time. (Similar to checking a flag.)
I don't know what you mean with "similar", but it should be at least four times more; it's a read-and-write operation and it's necessary to do an unlock operation afterwards. Furthermore the lock is atomic meaning cache synching etc which ought to be a fair bit more expensive (but only on SMP systems) - the flag check need not be atomic.
The necessary atomic refcounting on global data is a similar expense, though. Hmm, maybe one could relax the refcount garbing on global data and let it become garbage instead. Assuming that most short lived objects are thread local it might not give that much more garbage afterall. It's an interesting thought - the refcounting is also fairly expensive in itself. Pity there's no way of testing it without actually implementing it.
A problem with that is also that it'd be a semantic change since global stuff would stay around longer. There's much pike code that relies on timely refcount garbing. :\
/ Martin Stjernholm, Roxen IS
Previous text:
2004-02-03 07:22: Subject: Re: Default backend and thread backends?
It's a good plan, and I think it has a good chance of working. I would combine it with an addition go gc() that returns the thread-local flag on things which are no longer global automatically.
However, I'm still not entirely sure that this will be fast enough to be worth it. There will still be a lot of locking operations, and there will be a *lot* of places where the local/global flag would have to be checked.
In fact, I'm not entirely sure that it's mutch faster than having one mutex per object, because locking an unlocked mutex takes very little time. (Similar to checking a flag.)
/ Fredrik (Naranek) Hubinette (Real Build Master)
The necessary atomic refcounting on global data is a similar expense, though. Hmm, maybe one could relax the refcount garbing on global data and let it become garbage instead.
I think there are known gc algorithms that use an essentially one-bit reference count. Newly created objects have the bit clear (similar to the global/local flag set to local). Whenever a reference "leaks" (a concept that has to be defined more precisely), the bit is set. When an object goes out of scope, check the bit and deallocate it immediately if it is clear, and leave it to the gc if the bit is set.
/ Niels Möller (vässar rödpennan)
Previous text:
2004-02-04 23:14: Subject: Re: Default backend and thread backends?
There will still be a lot of locking operations,
The intention is that the locking strategy would keep that on the same level as the current interpreter lock. Have I missed something?
and there will be a *lot* of places where the local/global flag would have to be checked.
Everywhere where a memory object is changed or freed, but not where it only is read. Trivial per-object locking would have to lock for both operations, which I think is a big difference. It's hard to estimate the read-to-write ratio (not counting the stack, which is thread local and therefore unlocked in any case), but I chance it's around 10:1 at least.
In fact, I'm not entirely sure that it's mutch faster than having one mutex per object, because locking an unlocked mutex takes very little time. (Similar to checking a flag.)
I don't know what you mean with "similar", but it should be at least four times more; it's a read-and-write operation and it's necessary to do an unlock operation afterwards. Furthermore the lock is atomic meaning cache synching etc which ought to be a fair bit more expensive (but only on SMP systems) - the flag check need not be atomic.
The necessary atomic refcounting on global data is a similar expense, though. Hmm, maybe one could relax the refcount garbing on global data and let it become garbage instead. Assuming that most short lived objects are thread local it might not give that much more garbage afterall. It's an interesting thought - the refcounting is also fairly expensive in itself. Pity there's no way of testing it without actually implementing it.
A problem with that is also that it'd be a semantic change since global stuff would stay around longer. There's much pike code that relies on timely refcount garbing. :\
/ Martin Stjernholm, Roxen IS
I take it that a "leak" is a ref from somewhere else than the stack.
It could be made transitive too - a ref from another object without that bit set isn't a leak either. But otoh you'd have to set the bit recursively in that case.
This can only be applied to thread local data, but in that case I think normal refcounting still is fine: It preserves the semantics and it's not necessary to do it atomically.
/ Martin Stjernholm, Roxen IS
Previous text:
2004-02-05 12:39: Subject: Re: Default backend and thread backends?
The necessary atomic refcounting on global data is a similar expense, though. Hmm, maybe one could relax the refcount garbing on global data and let it become garbage instead.
I think there are known gc algorithms that use an essentially one-bit reference count. Newly created objects have the bit clear (similar to the global/local flag set to local). Whenever a reference "leaks" (a concept that has to be defined more precisely), the bit is set. When an object goes out of scope, check the bit and deallocate it immediately if it is clear, and leave it to the gc if the bit is set.
/ Niels Möller (vässar rödpennan)
I take it that a "leak" is a ref from somewhere else than the stack.
That might be a reasonable definition. It's fairly important that if one passes the local object to a function, that should count as a "leak" only if the called function (or some function it calls in turn) stores a reference somewhere where it might live longer than the function call.
/ Niels Möller (vässar rödpennan)
Previous text:
2004-02-05 15:01: Subject: Re: Default backend and thread backends?
I take it that a "leak" is a ref from somewhere else than the stack.
It could be made transitive too - a ref from another object without that bit set isn't a leak either. But otoh you'd have to set the bit recursively in that case.
This can only be applied to thread local data, but in that case I think normal refcounting still is fine: It preserves the semantics and it's not necessary to do it atomically.
/ Martin Stjernholm, Roxen IS
A problem is how you know when the object goes out of scope. When an inner function returns, it shouldn't free the object even if the bit is clear.
You could avoid freeing objects that are in the function arguments, but as soon as they (or anything inside them) get referenced from other local variables, you don't know anymore and have to set the bit to be on the safe side.
/ Martin Stjernholm, Roxen IS
Previous text:
2004-02-05 15:10: Subject: Re: Default backend and thread backends?
I take it that a "leak" is a ref from somewhere else than the stack.
That might be a reasonable definition. It's fairly important that if one passes the local object to a function, that should count as a "leak" only if the called function (or some function it calls in turn) stores a reference somewhere where it might live longer than the function call.
/ Niels Möller (vässar rödpennan)
My mental image was that only the function that created the object should free it when it goes out of scope (and the compiler has to keep track of this).
It would be desirable to also be able to return such objects, and have the caller somehow take over the "creator" responsibility. (For tail calls, one would also like pass over the responsibility to the called function).
But this ought to be described somewhere in the gc literature.
/ Niels Möller (vässar rödpennan)
Previous text:
2004-02-05 15:20: Subject: Re: Default backend and thread backends?
A problem is how you know when the object goes out of scope. When an inner function returns, it shouldn't free the object even if the bit is clear.
You could avoid freeing objects that are in the function arguments, but as soon as they (or anything inside them) get referenced from other local variables, you don't know anymore and have to set the bit to be on the safe side.
/ Martin Stjernholm, Roxen IS
What I want is to make sure that nothing in default backend will block application execution, especially callouts.
In that case I'd instead set up a thread farm for running things. (There are a couple of thread farms in different pike modules already. Pike ought to provide a global one that can be used by everyone.)
Also, I want that my application will be scalable in a way that I just can instaniate a copy of some object (which runs entire app) in separate thread, and it won't interfere with anything else (I mean, no lockouts etc)
The only safe way to do that is to run it in a separate process. The basic difference between threads and processes is precisely that the former can interfere with each other in all sorts of ways while the latter basically can't. A separate process will make use of SMP too.
/ Martin Stjernholm, Roxen IS
Previous text:
2004-02-01 08:13: Subject: Re: Default backend and thread backends?
On Sun, Feb 01, 2004 at 12:05:01AM +0100, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
An SMP machine won't be used efficiently anyway because of the interpreter lock.
Does it mean that multithreading in Pike is not effective and should be avoided if possible?
To me it seems a bit odd to have an implicit mapping between threads and backends;
What I want is to make sure that nothing in default backend will block application execution, especially callouts. Sure, I can do everyhing explicitly, but I've no control over external modules, which might be added later.
Also, I want that my application will be scalable in a way that I just can instaniate a copy of some object (which runs entire app) in separate thread, and it won't interfere with anything else (I mean, no lockouts etc) running in another threads (use of SMP would be ideal, but you say that it won't be used efficiently)...
Regards, /Al
/ Brevbäraren
pike-devel@lists.lysator.liu.se