I will have to tell you that I'm pleased with this. I use postgresql myself for any of my home projects, so a new and better pg driver is welcome.
----- Original Message ---- From: Stephen R. van den Berg srb@cuci.nl To: pike-devel@lists.lysator.liu.se Sent: Wednesday, July 23, 2008 12:35:12 PM Subject: pgsql performance
I've been working on a PGsql helper class to speed up the hotspots in my new pgsql PostgreSQL driver; so far it works rather well, and the performance of the new driver has leaped from 10% of the libpq module, to about 45% of the libpq module.
I won't be including the module until I have it at about 100% (or greater) of the libpq speed though.
Lance Dillon wrote:
I will have to tell you that I'm pleased with this. I use postgresql myself for any of my home projects, so a new and better pg driver is welcome.
Made a few more leaps. I'm now at 65% relative to the old postgres driver, for the small testcase I'm using. The question is of course, how far to go. I'm currently testing the queries raw, no processing afterward, just discarding the resultarrays. As soon as you try to do anything useful with them, the amount of Pike overheads from the query drops close to zero. For wide strings/binary queries, my new driver is faster already, BTW.
Any objections if I check in what I have on 7.7? It does not touch anything outside my pgsql driver (it creates a supporting cmod called PGsql).
As long as you don't break compilation and doc generation and make the appropriate CHANGES entries.
Peter Bortas @ Pike developers forum wrote:
As long as you don't break compilation and doc generation and make the appropriate CHANGES entries.
Quick question about naming the cmod. I currently ad-hoc named it PGsql. Consequently I have a PGsql/PGsql.cmod etc.
Is there any point in naming it something like _pgsql/_pgsql.cmod instead? It's a class that is not meant to be called by anyone besides my Sql/pgsql.pike driver.
The _ naming convention is usually for namespace that is false, for instance _Image_JPEG (which should never be called as Image_JPEG.something()), but is mapped in from another module (Image.JPEG.something()).
(The reason they are separated is that they might or might not be able to load. If you remove libjpeg from your system you might still want to be able to use Image.)
I'm not sure how your module is intended to use, but if it'll be linked into the module system under Sql.something, or never should be called by the user, the _ is probably a good idea as an indicator for that.
Mirar @ Pike developers forum wrote:
I'm not sure how your module is intended to use, but if it'll be linked into the module system under Sql.something, or never should be called by the user, the _ is probably a good idea as an indicator for that.
Ok, then I guessed correctly.
The _PGsql module doesn't compile on windows because of the poll stuff (and maybe other things too). Take a look in modules/files/file.c for how to do things like that in a more portable way.
You can see build results in xenofarm (burns.roxen.com and starcraft.roxen.com). In weekdays they only build during evenings and nights, but the response is better in the weekends.
It's not the time for that. Make sure it's disabled by default and enable it by default after the release.
Peter Bortas @ Pike developers forum wrote:
It's not the time for that. Make sure it's disabled by default and enable it by default after the release.
Ok, will do.
The pgsql and _PGsql helper are close to perfect now. There are no direct systemcalls anymore, I changed the xrealloc call into a plain realloc because that's what every other module seems to do (I fail to see the reason to use xalloc and realloc). The configure dependencies have been simplified. I cleaned up the stack operations and verified that there are no leaks there.
The pgsql driver is about 72% faster than the old postgres driver (but only after fixing the file_peek() function in files/file.c; that (small) patch is not committed to CVS, since it belongs to the core, and I will not commit that without consensus here), without the patch, the driver becomes about 20% *slower* than the old postgres driver. It supports SSL as well now.
P.S. What seems to puzzle me though, is that functions declared returning void in Pike, seem to require a pop_stack() after using apply() to call them.
/.../ (I fail to see the reason to use xalloc and realloc).
xalloc throws a pike error instead of returning NULL. xrealloc (in case that's what you meant) is otoh not a similar variant of realloc. It belongs to a set of malloc wrappers that looks like an old attempt to sort out the mess with different CRTs on windows. Ought to be cleaned away.
P.S. What seems to puzzle me though, is that functions declared returning void in Pike, seem to require a pop_stack() after using apply() to call them.
Yes, the various apply functions always leaves a single svalue on the stack, so you don't need to know the type of the callable.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
/.../ (I fail to see the reason to use xalloc and realloc).
xalloc throws a pike error instead of returning NULL. xrealloc (in case that's what you meant) is otoh not a similar variant of realloc. It belongs to a set of malloc wrappers that looks like an old attempt to sort out the mess with different CRTs on windows. Ought to be cleaned away.
Is there a function which throws an error like xalloc() but does a realloc() instead?
P.S. What seems to puzzle me though, is that functions declared returning void in Pike, seem to require a pop_stack() after using apply() to call them.
Yes, the various apply functions always leaves a single svalue on the stack, so you don't need to know the type of the callable.
Erm... Does this mean that the void functions themselves do or do not need to leave an svalue at the stack before returning? First I thought, they should not leave anything. Then I found some sample code in the Shuffler which explicitly does a push_int(0) at the end of a void function, so I copied that. Which is it now?
Erm... Does this mean that the void functions themselves do or do not need to leave an svalue at the stack before returning?
They don't need to.
First I thought, they should not leave anything. Then I found some sample code in the Shuffler which explicitly does a push_int(0) at the end of a void function, so I copied that. Which is it now?
Both are fine; apply will check if the function pushed anything and push a zero if it didn't.
Even more, apply will pop any extra values so there's no need to clean up the stack at all if the function doesn't return anything.
But if there is exactly one element on the stack, you need to clean it up. Otherwise it will be returned from the function.
Actually, looking at the code, it seems that it always returns the top stack element if you leave stuff on the stack. So unless you explicitly push a zero before returning, you _do_ need to clean up.
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Actually, looking at the code, it seems that it always returns the top stack element if you leave stuff on the stack. So unless you explicitly push a zero before returning, you _do_ need to clean up.
So, just so that I understand the intricacies, suppose I have a:
PIKEFUN int testme(int a, int b, int c) { starts out with a, b, and c on the stack ...processing... ...not popping the stack at all... return; }
Now, the stack will be reduced from [ a, b, c ] to just [ a ] ? I.e. it pops b and c, and then uses a as the return value?
No, it would be reduced to [ c ]. The top element is copied to the return position before anything is popped.
Well, if the function is declared to return void, then most likely the garbage return value will be popped shortly afterwards. It's only in cases like hilfe, which always print out the return value even from void functions, that you get to see the last argument or something as the return value (there are many void functions like that). I don't think it's worth the effort to push a zero for that case.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Well, if the function is declared to return void, then most likely the garbage return value will be popped shortly afterwards. It's only in cases like hilfe, which always print out the return value even from void functions, that you get to see the last argument or something as the return value (there are many void functions like that). I don't think it's worth the effort to push a zero for that case.
I thought so too, then I looked at my PIKEFUN void decodedatarow(int msglen) and already altered the code deliberately *not* to clean up the stack (it ends up with an array and two objects on the stack) and let apply() clean it up.
Then I realised that if anyone (in the future) happens to muck around with the topvalue after the void function has returned (the object), rather unforeseen bugs will crop up. So I clean up the stack anyway to go for the element of least surprise (I let apply() push the zero though, not going to bother with that).
I presume that actually pushing a zero only entails filling struct svalue with some fixed values and increasing the stackpointer? (i.e. a minor performance hit).
I'd say if someone is mucking around with your trash return value then it's that someone who's doing the wrong thing and not you. If the function is used in a void context then the compiler will make sure that the value gets popped soon enough.
But if it makes you feel better then go ahead and push a zero. Considering everything else that's a very minor performance hit.
So do you consider code such as this incorrect?
void handle(int opcode, mixed ... args) { function x = handlers[opcode]; mixed r = x(context, @args); if(stringp(r)) fd->write(r); else if(mappingp(r) && r->data) fd->write(r); }
Is it supposed to check the runtime type of x to find out if it's a void function or not?
Eventually the return value ends up somewhere where it actually has meaning, and if used there then that part is incorrect. Granted, it could be advisable to avoid returning a very large string or an object with side-effects at destruct or something like that. But I don't consider it any more correct to return a zero than some other value when there should be no return value at all; it's garbage no matter what it is.
(The core problem is that the calling convention is bogus at the C level. Every function ought to return an integer saying how many values it left on the stack, zero or more. That'd allow implementing multiple return values.)
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Eventually the return value ends up somewhere where it actually has meaning, and if used there then that part is incorrect. Granted, it could be advisable to avoid returning a very large string or an object with side-effects at destruct or something like that. But I don't consider it any more correct to return a zero than some other value when there should be no return value at all; it's garbage no matter what it is.
Technically you're right. But since the performance hit is rather small, I'd say it's bad practice to leave anything other than zero on the stack in case of a void function returning. Actually, apply could probably check that the function is a void function and force the zero on the stack.
(The core problem is that the calling convention is bogus at the C level. Every function ought to return an integer saying how many values it left on the stack, zero or more. That'd allow implementing multiple return values.)
Why not change this (for 7.9) to avoid all this sillyness and indeed support multiple return values?
Such an API change would be no small matter, exactly. But together with the multi-cpu-support (which would require new locking primitives in modules if they're to make use of it) and an all-around C-module API cleanup, I think it would fit an 8.0.
Problem is if we have the determination to make all that happen.
There is no particular need for adding "native" multiple return functions, since we already have syntax for handling multiple return with arrays. (Better typing for arrays so that the individual values can be separately typed would be nice though.) Functional programming languages usually handle this by always having exactly one return value of functions (and sometimes, in the case of pure functional languages, also exactly one parameter). If you need to return multiple values, you return one value which is a tuple. If you don't need any return value, you return a special "unit" or "bottom" value. In the case of Pike, zero usually takes the role of a "bottom" value when one in needed.
The use of arrays to implement multiple return values have several problems. You've already mentioned one. Another is that there's no good handling of variable number of arguments; it ought to work the same way as for function input. A third is that it's not very efficient (whether that matters or not depends on the circumstances of course). A fourth is that there's no difference between the return type array-of-something and multiple-return-values-of-something.
The last one could be handled by adding a tuple type, but it could also be handled by special forms for multiple return values, just as we already handle multiple arguments without a tuple type. Not that I'd mind a tuple type being added, but that'd have a much deeper impact on the type system than is necessary to solve this.
In that particular case, it seems that there's a calling convention for those handlers where the return values have special meaning. That means it's bad practice to put a void function into that "handlers" array/mapping.
In that particular case, it seems that there's a calling convention for those handlers where the return values have special meaning.
For handlers that have a return value, that value has a special meaning, yes. Handlers that don't need to use that don't need a return value.
That means it's bad practice to put a void function into that "handlers" array/mapping.
This might be a documented feature, and actually recommended practice within this environment.
Then the error is at the policy level. The return value has meaning, hence it's wrong to use void functions there. (In pure function call proxy code, like e.g. the Remote module, one has to accept that the return value is propagated even if it's garbage. Then the matter rests with the caller at the other side.)
Traditionally void has been 0 wherever it's actually used. Is that changed?
I don't think a policy is in error just because it simplifies for the end user. I have always considered void functions returning zero a part of the semantics of Pike.
May I ask whay your argument _for_ returning bogus values is?
I agree with marcus here. Void functions returning garbage are bogus. There hab better be a very good reason for it.
I don't think a policy is in error just because it simplifies for the end user.
Well, that's not all there is to it. And it's not a simplification when the user is lead to believe that (s)he has the choice to not return a value when the value does in fact have meaning. That's obfuscation.
May I ask whay your argument _for_ returning bogus values is?
You're making the assumption that a zero would not be bogus, which I don't agree with. I've already explained my argument: It's bogus whatever it is, so there's no use wasting effort making it a particular bogus value. The design flaw is deeper.
But as I said earlier, when it might have practical effect (side-effect object etc) then it could be good to clean up the return value.
You're making the assumption that a zero would not be bogus, which I don't agree with.
Zero is the value of "void" when propagated to an integer, so it isn't bogus. A particlar non-bogus value for void is needed, otherwise e.g. "void|int foo();" does not work.
I've already explained my argument: It's bogus whatever it is, so there's no use wasting effort making it a particular bogus value. The design flaw is deeper.
I've already shown that there is use "wasting" this effort, so your argument does not hold.
A particlar non-bogus value for void is needed, otherwise e.g. "void|int foo();" does not work.
That ought not work either. That's a meaningless return type since there's no out-of-band way to know if the function chose to return a value or not. If returning zero is the alternative to the other value, then write "int(0..0)|<something>", not "void|<something>". A warning for such a return type is in order. (I can confess I've written return types like that too long ago. It was a mistake.)
I don't see any point with letting void become just an alias for int(0..0). Hopefully "zero" will be introduced for that purpose.
I've already shown that there is use "wasting" this effort, so your argument does not hold.
You've only shown it if one chooses to accept your argument.
The point is that you can have a prototype "void|<something> func();", which can be implemented either by a function "void func();", or a function "<something> func();" (type restriction). So if you don't need the services provided by returning a <something>, you can write
void func() { code code }
instead of
int(0..0) func() { code code return 0; }
which is shorter and more to the point.
You've only shown it if one chooses to accept your argument.
The argument that simplifying for the end user has value, yes. If you do not agree with that argument, then I don't suppose we have anything further to discuss.
I guess so. My point is still that you're obfuscating if you're trying to do something like that, since the return value will be used by the caller regardless whether the programmer there chooses void as return type or not. The choice is simply not up the one writing the called function, but the one writing the function call (that would be different if there was a way to query the number of values the function actually returned).
In other words, since the function has to return zero to avoid some special handling of the return value, it's more clear - and hence more user friendly - to explicitly return zero and declare the function accordingly.
It's misuse to exploit void as a trick only to shorten the return zero bit slightly. It's unfortunate that the the current implementation allows it; it should have been an error at runtime too to use the return value from a void function.
In a highly dynamic environment (which is something Pike is often used to implement), it's not hard to imagine a service which is parametrized with a generic function value, and which may do something with the return value if any. I think returning bogus values from void functions is very bad practice, as long as those return values are visible from Pike code.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
The _PGsql module doesn't compile on windows because of the poll stuff (and maybe other things too). Take a look in modules/files/file.c for how to do things like that in a more portable way.
You can see build results in xenofarm (burns.roxen.com and starcraft.roxen.com). In weekdays they only build during evenings and nights, but the response is better in the weekends.
I am trying to reach either machine using a webbrowser, but that times out, is that to be expected, or am I doing something stupid?
I actually copied the portability code in full from files/file.c into the PGsql driver, so it should work just fine on Windows now. I wanted to check just once, and if it doesn't get a green light, disable the module for compilation.
I am trying to reach either machine using a webbrowser, but that times out, is that to be expected, or am I doing something stupid?
That is to be expected, _and_ you are doing something stupid. :-)
You are supposed to point your browser to http://pike.ida.liu.se/development/pikefarm/7.7.xml, and look at the lines which are labled "burns.roxen.com" and "starcraft.roxen.com". You can click on the rightmost dot to get more information about the latest builds.
pike-devel@lists.lysator.liu.se