Hi,
Unfortunately we all still have to do without multicore or multi-cpu support in pike. Even though I don't know how the stuff works at the lowest levels I am aware it has something to do with the way pike handles it's variable structures internally, and it is definitely not a simple thing to change while maintaining a good performance at it. My curiousity is about something different, but related to having some limited multicore support.
I vaguely remember hearing, or having read something somewhere about the way pike interacts with libmysql. My memory might be fooled by wishful thinking, so I might be completely wrong. I believe I understood pike is able to release some restrictions at the moment it knows it enters the code for libmysql. This would be the case because it would be benificial for performance; the mysql library would be able to switch to a mode where it could manage it's memory more freely, or where it would be able to run on a different core/cpu.
If that is possible, I wonder if libmysql is the only C-library doing this kind of trick, or if other libraries (especially the CPU and memory intensive ones) are doing this kind of trick too. Image libraries, zlib and any C-library doing lots of 'library-local' memory operations (libxslt) or any library doing CPU intensive stuff seem really nice candidates...
Am I wrong at what I think pike is doing? If not, is this just for libmysql or are other modules doing this too? If I was wrong, is what I was thinking technically possible, or am I just being too idealistic?
Regards,
Arjan
__________________________________________________________ Deze e-mail en de inhoud is vertrouwelijk en uitsluitend bestemd voor de geadresseerde(n). Indien u niet de geadresseerde bent van deze e-mail verzoeken wij u dit direct door te geven aan de verzender door middel van een reply e-mail en de ontvangen e-mail uit uw systemen te verwijderen. Als u geen geadresseerde bent, is het niet toegestaan om kennis te nemen van de inhoud, deze te kopieren, te verspreiden, bekend te maken aan derden noch anderszins te gebruiken.
The information contained in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Please notify us immediately if you have received it in error by reply e-mail and then delete this message from your system. __________________________________________________________
While Pike is doing something that might be taking some time, it can run other Pike-threads in the foreground. This is done in a lot of operations, all of those you mention, I believe - all that make sense.
So if you are doing a lot of image operations, it will use several real threads for the operations.
In the last episode (Mar 27), Mirar @ Pike developers forum said:
While Pike is doing something that might be taking some time, it can run other Pike-threads in the foreground. This is done in a lot of operations, all of those you mention, I believe - all that make sense.
So if you are doing a lot of image operations, it will use several real threads for the operations.
Any place in C code that you see THREADS_ALLOW / THREADS_DISALLOW pairs, that's where the code has released the main pike interpreter lock, allowing other threads to run.
Dan Nelson wrote:
In the last episode (Mar 27), Mirar @ Pike developers forum said:
So if you are doing a lot of image operations, it will use several real threads for the operations.
Any place in C code that you see THREADS_ALLOW / THREADS_DISALLOW pairs, that's where the code has released the main pike interpreter lock, allowing other threads to run.
Yes, but AFAIK those refer to Pike-threads, which are still being run on a single core.
One of the few modules that actually makes use of multi-cores is the shuffler module in Pike. But that solves more of a latency problem than a CPU problem. The (current) best way to utilise multicores in combination with a database is by using stored procedures in SQL. The SQL-backend will probably use one core per open SQL session.
I'm not quite sure what you mean.
Any code within a THREADS_ALLOW()/THREADS_DISALLOW() pair can be executed simultainiously in multiple threads on multiple cores. They only release the interpreter lock so that another thread may execute Pike-code.
Thus, many modules that glue C-libraries into Pike will have THREADS_ALLOW()/DISALLOW() statements around the actual call to the library it wraps, allowing other threads to execute while the library is called.
In the cases where wrappers don't do this, it's not unlikely that the wrapped library isn't thread-safe and thus would require separate locking if the interpreter lock was released.
THREADS_ALLOW will allow threads to run in the background, but you can only have one thread at a time access the Pike datatype tree, so only one interpreter thread will run at any specified time.
This uses two threads, two cores if your OS runs threads on different cores:
Pike v7.8 release 205 running Hilfe v3.5 (Incremental Pike Frontend) | > void test() { for (;;) { Image.Image i=Image.Image(1000,1000); | string s=Image.JPEG.encode(i); } } | > Thread.Thread(test); | (1) Result: Thread.Thread(1079527760) | > Thread.Thread(test); | (2) Result: Thread.Thread(1080490320)
Only minimal time is spent in the pike interpreter, and most of the time is spent "in the background" with the possibility for other pike interpreter threads to be run.
However,
| Pike v7.8 release 205 running Hilfe v3.5 (Incremental Pike Frontend) | > mapping m=([]); | > void test() { for (;;) m[random(100)]=random(100); } | > Thread.Thread(test); Thread.Thread(test); | (1) Result: Thread.Thread(1106815312) | (2) Result: Thread.Thread(1087379792)
Will only use one thread, since it's very interpreter-intensive. It needs to have the lock to be able to do that mapping operation, so no more than one core can be used. (And that makes the Hilfe interepreter break down to a halt. I'd consider that a bug, actually.)
(I failed to notice replies earlier due to unexpected mail filtering - oops.)
-----Oorspronkelijk bericht----- Van: pike-devel-bounces@lists.lysator.liu.se [mailto:pike-devel-bounces@lists.lysator.liu.se] Namens Mirar @ Pike developers forum Verzonden: Monday, March 30, 2009 9:50 AM Aan: pike-devel@lists.lysator.liu.se Onderwerp: Re: (semi-)multicore support?
This uses two threads, two cores if your OS runs threads on different cores:
Pike v7.8 release 205 running Hilfe v3.5 (Incremental Pike Frontend) | > void test() { for (;;) { Image.Image i=Image.Image(1000,1000); | string s=Image.JPEG.encode(i); } }
| > Thread.Thread(test); | (1) Result: Thread.Thread(1079527760) | > Thread.Thread(test); | (2) Result: Thread.Thread(1080490320)
It is limited to two OS threads (one for Pike, one for libraries, or maybe one for each library), or does it have the possibility of one OS thread per library call (theoratically resulting in X OS threads, which might or might not be faster here if the system has more than two cores)?
Regards,
Arjan
__________________________________________________________ Deze e-mail en de inhoud is vertrouwelijk en uitsluitend bestemd voor de geadresseerde(n). Indien u niet de geadresseerde bent van deze e-mail verzoeken wij u dit direct door te geven aan de verzender door middel van een reply e-mail en de ontvangen e-mail uit uw systemen te verwijderen. Als u geen geadresseerde bent, is het niet toegestaan om kennis te nemen van de inhoud, deze te kopieren, te verspreiden, bekend te maken aan derden noch anderszins te gebruiken.
The information contained in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Please notify us immediately if you have received it in error by reply e-mail and then delete this message from your system. __________________________________________________________
Any number of C code calls. THREADS_ALLOW is only used around code considered thread safe.
Okay, so it appears it's not likely to gain much anymore for C-libraries because the interpreter lock is being released where possible already, allowing it to run on separate cores at that point. What if the 'C-library' is another instance of Pike? Could it be (made) possible to have Pike release it's interpreter locks if it would be known to not have any inter-Pike processing there anyway?
Say, there is a function implemented in Pike which could be very CPU-intensive and only handles locally created variables. I can imagine some definition which would tell Pike which variables are interfacing between the two instances, after which instance #1 and instance #2 do their own separate processing (and interpreter locking) without interfering with the other instance until it has data to return (if any).
Regards,
Arjan
__________________________________________________________ Deze e-mail en de inhoud is vertrouwelijk en uitsluitend bestemd voor de geadresseerde(n). Indien u niet de geadresseerde bent van deze e-mail verzoeken wij u dit direct door te geven aan de verzender door middel van een reply e-mail en de ontvangen e-mail uit uw systemen te verwijderen. Als u geen geadresseerde bent, is het niet toegestaan om kennis te nemen van de inhoud, deze te kopieren, te verspreiden, bekend te maken aan derden noch anderszins te gebruiken.
The information contained in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Please notify us immediately if you have received it in error by reply e-mail and then delete this message from your system. __________________________________________________________
Running a pike as a lib inside another could theoretically work if the embedding interface is developed more. Right now I think there would be too many symbol conflicts.
But it'd be very cumbersome to use. Since the two interpreters would be completely separate, the only way to pass data between them would be through pipes or the shared memory interface, which only can pass strings. You could just as well run several pike processes.
The planned multi-cpu support aims a lot higher than that. You're welcome to read the preliminary spec here: http://pike-git.lysator.liu.se/gitweb.cgi?p=pikex.git;a=blob_plain;f=multi-c... Feedback is welcome.
On Tue, Mar 31, 2009 at 02:15:04PM +0000, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
the only way to pass data between them would be through pipes or the shared memory interface, which only can pass strings. You could just as well run several pike processes.
the Remote interface provides support for passing more elaborate datastructures.
it could be interesting to have an interface that looks like threads but essentially runs multiple processes and hides the multiprocess communication from the user.
starting a 'thread' would create a new process and run the given function, using Remote to pass data between the 'threads'
i think the result would be the same as pythons multiprocessor module.
greetings, martin.
the Remote interface provides support for passing more elaborate datastructures.
The point here, I think, is that it does so by converting the elaborate datastructure to a string, which is then passed. It's not that strings can't be used for anything, there's just some overhead that needs to be taken into account.
isn't any datastructure in memory in the end a string of bytes?
so what kind of string is meant here?
greetings, martin.
To be precise, what is meant is the overhead of marshalling and unmarshalling every structure that is sent between the "threads". It's not possible to share the (likely discontinuous) byte sequence that represents a data structure in memory.
The drawback is twofold: One is the overhead for the marshalling/sending/receiving/unmarshalling itself, the other is that the same structure cannot be shared and changed destructively from several "threads". E.g. the protocol cache in Roxen can't be shared, it would have to be copied back and forth. To work around that you'd have to invent a protocol to let the other "thread" do your changes for you, but that's quite a different thing from doing changes in the same data structure directly from several threads.
ah, i see, well, of course i didn't think such a multiprocess support would replace threading, when the advantage of threads is being able to share datastructures, but it could be interesting for situations where the threads are heavy workers that don't need to communicate much data.
a version of roxen that spreads over all available cpu's for example. i guess similar cases are on arjans mind too.
arjan, you could just look at pikes Remote module and take some ideas from pythons multiprocessing module to achive the effect you are asking for.
greetings, martin.
a version of roxen that spreads over all available cpu's for example. i guess similar cases are on arjans mind too.
That is the main issue I had in mind, but pike being multi-core does not limit it's use to roxen only. I was wondering about some possibilities for improving some multi-core support, without knowing details about how the internals currently work exactly. What better place than to start a thread here? I was hoping for some feedback on issues which might solve some issues. Having a discussion can shed a light on what is currently possible, without having to go through the full deal of implementing full multi-core support yet.
arjan, you could just look at pikes Remote module and take some ideas from pythons multiprocessing module to achive the effect you are
asking
for.
Thanks, I will definitely take a look at it.
Greetings,
Arjan
__________________________________________________________ Deze e-mail en de inhoud is vertrouwelijk en uitsluitend bestemd voor de geadresseerde(n). Indien u niet de geadresseerde bent van deze e-mail verzoeken wij u dit direct door te geven aan de verzender door middel van een reply e-mail en de ontvangen e-mail uit uw systemen te verwijderen. Als u geen geadresseerde bent, is het niet toegestaan om kennis te nemen van de inhoud, deze te kopieren, te verspreiden, bekend te maken aan derden noch anderszins te gebruiken.
The information contained in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Please notify us immediately if you have received it in error by reply e-mail and then delete this message from your system. __________________________________________________________
On Wed, Apr 01, 2009 at 05:04:36PM +0200, Arjan van Staalduijnen wrote:
arjan, you could just look at pikes Remote module and take some ideas from pythons multiprocessing module to achive the effect you are asking for.
Thanks, I will definitely take a look at it.
without having looked myself, i am hoping that it should be possible to have a simple Remote server that can just accept functions or objects to run.
then in the main application only some code is needed to either start up such a Remote server, or manage a pool of already running instances.
greetings, martin.
Running a pike as a lib inside another could theoretically work if the embedding interface is developed more.
Since the two interpreters would be completely separate, the only way to pass data between them would be through pipes or the shared memory interface, which only can pass strings.
I was basically mentioning it as 'library-like', because I was guessing the current libraries don't need to go to that kind of memory interfacing - although I have no idea how C-libraries can exchange data with Pike, running on another core in any other way than through shared memory. I lack practical knowledge at that point.
I did not have full 'access all variables' in mind; I was suggesting an interface which would define a limited set of variables (say, function arguments and it's return value) which would be transferred (one-way) between the instances. From entry it might put a pike thread on hold, because it depends on output of code which is running on another core. It could however allow other pike threads to run (as long as they don't access that 'mutexed' output variable?). Far from ideal, but possibly a step towards bringing multi-core to pike, if one knows how to use it in pike.
The planned multi-cpu support aims a lot higher than that. You're welcome to read the preliminary spec here:
http://pike-git.lysator.liu.se/gitweb.cgi?p=pikex.git;a=blob_plain;f=mul ti-cpu.txt;hb=mast/multi-cpu
Feedback is welcome.
It's nice reading, but I'm far from reading it completely yet. I hope we can get there in small steps.
AvS
__________________________________________________________ Deze e-mail en de inhoud is vertrouwelijk en uitsluitend bestemd voor de geadresseerde(n). Indien u niet de geadresseerde bent van deze e-mail verzoeken wij u dit direct door te geven aan de verzender door middel van een reply e-mail en de ontvangen e-mail uit uw systemen te verwijderen. Als u geen geadresseerde bent, is het niet toegestaan om kennis te nemen van de inhoud, deze te kopieren, te verspreiden, bekend te maken aan derden noch anderszins te gebruiken.
The information contained in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Please notify us immediately if you have received it in error by reply e-mail and then delete this message from your system. __________________________________________________________
It is limited to two OS threads (one for Pike, one for libraries, or maybe one for each library), or does it have the possibility of one OS thread per library call (theoratically resulting in X OS threads, which might or might not be faster here if the system has more than two cores)?
To put it another way:
Pike always have one OS - pthread_create, _beginthread, etc - thread per thread, it just that it can't run more than one of the threads at a time in the interpreter.
The rest will be waiting for the interpreter lock, unless they are doing something that doesn't need it, like image- or file operations. In those cases they release the interpreter lock, so other threads can grab it. It will more or less look like this:
real thread started: lock(the interpreter lock) forever do pike calculations and variable operations
while doing this, sometimes (function call, loopbacks, etc) unlock(the interpreter lock) // let other interpreter threads run lock(the interpreter lock)
in some functions, like Image.JPEG.encode or file read: unlock(the interpreter lock) do the heavy stuff // let other interpreter threads run lock(the interpreter lock)
So it's not limited, but every Pike thread will use up real thread resources like stack address space. (This is a problem if you plan on having an application with 1000 threads.)
pike-devel@lists.lysator.liu.se