Per Hedbor wrote:
You can fix most of the very slow functions, but getting better than about 50% performance is rather hard when you have to check the size-shift when extracting and adding chars. The cpp() function became about 40% slower after my rewrite that changed it to use PCHARP everywhere, as an example.
Ok, can't argue with that.
Also, the fact that String.Buffer can contain non-8bit characters is directly /harmful/ when doing an IOBuffer. So the extra overhead is not really there for any good reason.
Well, I wouldn't say harmful, non-obvious perhaps. The system calls ultimately do the 8-bit check. Don't get me wrong: I come from an 8-bit world, I just learned to live with the fact that in Pike you have to deal with more, usually.
Why not simply keep String.Buffer as a simple string-adding-buffer, and let the more complex stuff be done in other code, as before? IOBuffer is really only a more efficient version of ADT.struct. Why is it so dangerous to have it around? I find the additions to String.Buffer to be significantly less beautiful, if that is a way to put it. There was not really any shared funcitonality between String.Buffer and Stdio.IOBuffer except one before: Adding strings to a buffer that is not otherwise used until "done".
Well, ok, but then I suggest we dumb down String.Buffer again to what it was in 7.8, and get rid of the addat() as well. Since the addat() use case becomes rather narrow considering the existence of IOBuffer. Or am I missing something as to its usefulness in String.Buffer?
It is somewhat esoteric to use add_int(..) add_short() etc when the string contains widechars.
Correct :-). I did have a slightly tingly feeling when considering what the semantics were supposed to be in that case.
It does not really make sense (incidentally, your JSON parser assumes utf8 which is hardly true for a string, those are supposed to be unicode)
The JSON parser is only useful if unleashed on 8-bit strings, of course.
Also, how do you handle the 2G+ buffer case? I can see no code for that (yes, I have hit 2G+ buffer sizes, which is why IOBuffer is very carefully using size_t everywhere).
Indeed. Didn't want to fiddle with that until we decide if IOBuffer is dropped or not.
And as for codesize, String.Buffer, not counting any stringbuilder code, compiled using clang -Ofast on my mac, weighs in at 27893 bytes (only counting functions named _Buffer_* in builtin.o, so it does not include the helper functions).
The text segment of buffer.o is 23425. In total. Without using stringbuilder. So, no, not really.
Ok, I'm suprised, but it only shows that overhead creeps in with wide character treatment.
Also, one thing at random I noticed when reading the stralloc.c commits:
- static INLINE int min_magnitude(const p_wchar2 c)
- static INLINE unsigned min_magnitude(const unsigned c) {
- if(c<0) return 2;
- if(c<256) return 0;
- if(c<65536) return 1;
- return 2;
- return c<256 ? 0 : c<65536 ? 1 : 2;
is not really much of a speedup, if any, in fact it reorganized the instructions for me to make it less efficient if c>=0 and <= 255 when using clang (you would need to add some LIKELY to obvious locations). YMMV when the code is included in the location it's called and other compilers are used. Which is sort of the point.
I would have expected the compiler to do the unsigned comparison first and therefore help the common case. I wasn't familiar with the (UN)LIKELY macros; seems like they could help, indeed.
Also, it is generally a good idea to add what is unsigned, we do so every where else. And, pike 32bit chars are actually signed.
You mean add a comment about the forced unsigned comparison?
Another random thing: _encode should probably not empty the buffer. It's a fairly unexpected side-effect.
I believe I copied this from IOBuffer. Maybe I messed it up.
Benefits of the new String.Buffer implementation:
- Close to drop-in replacement for IOBuffer.
Yes, but slower, and why?
Well, I'd hoped that the speed difference would only be a few % for the 8-bit case. It turns out that it is not. If it would have been, it would have increased cache-locality by reducing the code footprint of Pike.
- A String.Buffer initialised from a shared string allows one to
access the shared string in parts without copying the data at all.
Of course the same is true for IOBuffer. IOBuffer does not even do any copying when you read all data and then later on add a new string. Which String.Buffer does. :)
Yes, well, that was on the todo short-list. But probably not worth fixing if we're downsizing String.Buffer again.
- No need to account for locks on subbuffers; String.Buffer resolves
those automatically and on demand.
- Explicit copying of the buffer data is not supported anymore, the
implicit copy-on-demand mechanism is more efficient.
Well, except that the explicit copy is mainly there to make it obvious when you accidentally copy. It was intentional, so this is a drawback.
In String.Buffer it would have been easy to add a performance-check-aid which throws exceptions if buffer content is being copied. I'd actually consider it better design to change IOBuffer to something similar; i.e. automatically copy, but allow an explicitly locked-down buffer which resists automatic copying by throwing exceptions. The idea being that code should "just work", and if you're concerned about performance (because you know the buffers can be big), you lock it down to prevent accidents (usually during development).
Also, am I right in the fact that the whole buffer has to be copied if a subbuffer is live while the main buffer is modified?
Correct. That would have resulted in an exception in the IOBuffer case, my code handles it automagically. I'd prefer the programmer ask for protection instead of enforcing it.
That seems to be set up for rather severe ordo-accidents that will not be detected. Common case:
you have a packet based protocol. You extract a packet as a buffer. You then do a call_out or similar to handle the packet (thread, whatever).
Then new data arrives, and is added to the buffer.
In the String.Buffer case this might copy the buffer, which is never what you want.
Quite. But, consider the alternative with IOBuffer: depending on race conditions and timing, you might not notice the problem for a long time and then it bites you throwing a "locked" exception when the programmer is not around.
Please note that it would be trivially easy to do the same in IOBuffer, just modify the code in io_ensure_unlocked to copy and then unlock, I /intentionally/ did not do so, even when it was suggested here, since it is, in my opinion, much better if you are required to explicitly uncouple the subbuffer, which adds a simple call in one place, than to accidentally copy the whole buffer, which will add extreme overhead for no good reason that will be very hard to find.
:-). I argue the exact opposite, because the tradeoff here ultimately is that a. Unexpected performance problems creep up. b. Unexpected "locked" exceptions creep up due to race conditions.
Given the choice, I prefer a over b anytime. Once I have a, I would "lock down explicitly" and see where it throws, but then it's under programmer control, as opposed to in production and the programmer is not around.