Object `+= lfuns are not called when they should have been. It appears that we don't call += lfuns directly, instead we expand a += b+c into a = a+b+c. Well, that is all fine and dandy, but, the f_add function then tries to fold this back into a += by checking if the number of refs on the object == 1, and then calling `+= instead.
This check does not give any meaningful results given normal objects and += usage: - For one, the object might have an extra instance on the stack, in which case the minimum number of references becomes 2 already. - And for another, the object might have multiple instances which all refer to the same object; which *implies* that updating one of those using += should modify all of them.
The following sample code:
class refmeharder { int val; void create(int i) { werror("Set %d\n",val = i); }; void getval(string s) { werror("%s%d\n",s,val); }; void incval() { werror("Inc %d\n",++val); }; refmeharder `+=(int i) { werror("+= called %d\n", val+=i); return this; }; refmeharder `+(int i) { werror("+ called %d\n", val+i); return refmeharder(val+i); }; };
int main(int argc, array(string) argv) { refmeharder a=refmeharder(2); refmeharder b=refmeharder(3); refmeharder c=a,d=a,e=a,f=a,g=a,h=a; a->incval(); b->incval(); c->incval(); a+=5; b+=11; c+=23; a->getval("a"); b->getval("b"); c->getval("c"); return 0; }
Together with the following debug-patch:
diff --git a/src/operators.c b/src/operators.c index 45de79f..18fbe4a 100644 --- a/src/operators.c +++ b/src/operators.c @@ -1505,6 +1505,8 @@ PMOD_EXPORT void f_add(INT32 args) /* The first argument is an object. */ o = sp[-args].u.object; p = o->prog->inherits[SUBTYPEOF(sp[-args])].prog; +if((i = FIND_LFUN(p, LFUN_ADD_EQ)) != -1) +fprintf(stderr, "Object %p type %p refs %d\n", o, p, o->refs); if(o->refs==1 && (i = FIND_LFUN(p, LFUN_ADD_EQ)) != -1) {
Results in an output of: Set 2000 Set 3000 Inc 2001 Inc 3001 Inc 2002 Object 0x200e750 type 0x2055450 refs 8 + called 2007 Set 2007 Object 0x200e7b0 type 0x2055450 refs 2 + called 3012 Set 3012 Object 0x200e750 type 0x2055450 refs 7 + called 2025 Set 2025 a2007 b3012 c2025
But should have resulted in an output of: Set 2000 Set 3000 Inc 2001 Inc 3001 Inc 2002 Object 0x200e750 type 0x2055450 refs 8 += called 2007 Object 0x200e7b0 type 0x2055450 refs 2 += called 3012 Object 0x200e750 type 0x2055450 refs 7 += called 2040 a2040 b3012 c2040
I'll be adding something to this effect to the testsuites. The immediate questions though here are: - The refs==1 check is way too limiting, the question is, is there a meaningful alternative? (Probably not). - The only way to solve this seems to be by supporting calling `+= lfuns directly. But, if we do, do we need to somehow suppress the autoconversion from a+=b+c to a=a+b+c ? Can it be suppressed for objects only? Where in the code is this conversion being done?
a += b is a shorthand for a = a + b, so the refs==1 check is the only correct condition for when to use the `+= lfun. The `+= lfun therefore is _not_ operator overloading for +=, but rather an optimization for when a has only one reference. This is compatible with how += works on other types, like strings and integers.
The case you are describing can be correctly solved by using a method; similar to ->add() in the String.Buffer. I personally never use += on String Buffer objects since - as you pointed out - the refs==1 case is hard to control.
arne
On Mon, 8 Sep 2014, Stephen R. van den Berg wrote:
Object `+= lfuns are not called when they should have been. It appears that we don't call += lfuns directly, instead we expand a += b+c into a = a+b+c. Well, that is all fine and dandy, but, the f_add function then tries to fold this back into a += by checking if the number of refs on the object == 1, and then calling `+= instead.
This check does not give any meaningful results given normal objects and += usage:
- For one, the object might have an extra instance on the stack, in which
case the minimum number of references becomes 2 already.
- And for another, the object might have multiple instances which all refer
to the same object; which *implies* that updating one of those using += should modify all of them.
The following sample code:
class refmeharder { int val; void create(int i) { werror("Set %d\n",val = i); }; void getval(string s) { werror("%s%d\n",s,val); }; void incval() { werror("Inc %d\n",++val); }; refmeharder `+=(int i) { werror("+= called %d\n", val+=i); return this; }; refmeharder `+(int i) { werror("+ called %d\n", val+i); return refmeharder(val+i); }; };
int main(int argc, array(string) argv) { refmeharder a=refmeharder(2); refmeharder b=refmeharder(3); refmeharder c=a,d=a,e=a,f=a,g=a,h=a; a->incval(); b->incval(); c->incval(); a+=5; b+=11; c+=23; a->getval("a"); b->getval("b"); c->getval("c"); return 0; }
Together with the following debug-patch:
diff --git a/src/operators.c b/src/operators.c index 45de79f..18fbe4a 100644 --- a/src/operators.c +++ b/src/operators.c @@ -1505,6 +1505,8 @@ PMOD_EXPORT void f_add(INT32 args) /* The first argument is an object. */ o = sp[-args].u.object; p = o->prog->inherits[SUBTYPEOF(sp[-args])].prog; +if((i = FIND_LFUN(p, LFUN_ADD_EQ)) != -1) +fprintf(stderr, "Object %p type %p refs %d\n", o, p, o->refs); if(o->refs==1 && (i = FIND_LFUN(p, LFUN_ADD_EQ)) != -1) {
Results in an output of: Set 2000 Set 3000 Inc 2001 Inc 3001 Inc 2002 Object 0x200e750 type 0x2055450 refs 8
- called 2007
Set 2007 Object 0x200e7b0 type 0x2055450 refs 2
- called 3012
Set 3012 Object 0x200e750 type 0x2055450 refs 7
- called 2025
Set 2025 a2007 b3012 c2025
But should have resulted in an output of: Set 2000 Set 3000 Inc 2001 Inc 3001 Inc 2002 Object 0x200e750 type 0x2055450 refs 8 += called 2007 Object 0x200e7b0 type 0x2055450 refs 2 += called 3012 Object 0x200e750 type 0x2055450 refs 7 += called 2040 a2040 b3012 c2040
I'll be adding something to this effect to the testsuites. The immediate questions though here are:
- The refs==1 check is way too limiting, the question is, is there a
meaningful alternative? (Probably not).
- The only way to solve this seems to be by supporting
calling `+= lfuns directly. But, if we do, do we need to somehow suppress the autoconversion from a+=b+c to a=a+b+c ? Can it be suppressed for objects only? Where in the code is this conversion being done? -- Stephen.
Arne Goedeke wrote:
a += b is a shorthand for a = a + b, so the refs==1 check is the only correct condition for when to use the `+= lfun. The `+= lfun therefore is _not_ operator overloading for +=, but rather an optimization for when a has only one reference.
Yes, that seems to describe what happens. That is not what someone would expect that implements an `+= lfun on an object.
The question now remains: - Do we agree this is a bug? - Or do we document that this is a feature? - Or maybe we already documented it as a feature and I just didn't see it.
On Mon, Sep 8, 2014 at 2:21 PM, Stephen R. van den Berg srb@cuci.nl wrote:
Arne Goedeke wrote:
a += b is a shorthand for a = a + b, so the refs==1 check is the only correct condition for when to use the `+= lfun. The `+= lfun therefore is _not_ operator overloading for +=, but rather an optimization for when a has only one reference.
Yes, that seems to describe what happens. That is not what someone would expect that implements an `+= lfun on an object.
Then they should have read the docs: http://pike.lysator.liu.se/generated/manual/modref/ex/lfun_3A_3A/_backtick_a...
Tobi
On Mon, Sep 8, 2014 at 10:21 PM, Stephen R. van den Berg srb@cuci.nl wrote:
The question now remains:
- Do we agree this is a bug?
- Or do we document that this is a feature?
- Or maybe we already documented it as a feature and I just didn't see it.
It's documented here:
http://pike.lysator.liu.se/generated/manual/modref/ex/lfun_3A_3A/_backtick_a...
But it's a distinct difference from, say, the Python equivalent, so it's worth making sure it's properly visible. Where would you go looking for that info?
ChrisA
Tobias S. Josefowitz wrote:
Then they should have read the docs: http://pike.lysator.liu.se/generated/manual/modref/ex/lfun_3A_3A/_backtick_a...
Ok, it was option three then, I guess. I apologise for the noise. Let the record show though that I consider it to be a rather messy way to handle this :-).
Chris Angelico wrote:
On Mon, Sep 8, 2014 at 10:21 PM, Stephen R. van den Berg srb@cuci.nl wrote:
- Or maybe we already documented it as a feature and I just didn't see it.
But it's a distinct difference from, say, the Python equivalent, so it's worth making sure it's properly visible. Where would you go looking for that info?
Well, for one, someone browsing the docs needs to be explained the difference between http://pike.lysator.liu.se/generated/manual/modref/ex/predef_3A_3A/_backtick... and http://pike.lysator.liu.se/generated/manual/modref/ex/lfun_3A_3A/_backtick_a...
Other than that, I guess some kind of overview paragraph discussing lfuns in general, showing some examples, and including a new notes/warnings regarding the pitfalls with `+= and other lfuns would be helpful.
On Mon, Sep 8, 2014 at 10:30 PM, Stephen R. van den Berg srb@cuci.nl wrote:
Let the record show though that I consider it to be a rather messy way to handle this :-).
I've worked with Python's way of doing things, and it has its own issues. This kind of thing comes up periodically on python-list:
t = ("Foo", [1,2], "Bar") t[1] += [3]
Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> t[1] += [3] TypeError: 'tuple' object does not support item assignment
t
('Foo', [1, 2, 3], 'Bar')
In Python, "x += y" becomes "x = x.__iadd__(y)", which means two things:
1) The object is asked to do an in-place addition with the new contents 2) Whatever is returned is stored back where x was.
The tuple rejects the second half of that (tuples are immutable, so you can't assign to their members), but the first half has already happened, and this causes some confusion.
With Pike's semantics, "t[1] += whatever" becomes "t[1] = t[1] + whatever", which won't change t[1] at all if the assignment fails. There's a much stronger incentive to write explicit methods to do the appending (in Python's case, lists have a .extend() method that's better suited to this, but there's a temptation to use += anyway), because it's guaranteed to be more efficient.
Personally, I think use of += for anything other than pure optimization is a dangerous idea - the expected semantics aren't as obvious as you might think. It's easy with immutables like strings and integers (this is true in both Python and Pike, btw), but for mutables, it's cleaner to work with methods.
ChrisA
Ah. So that is why nobody is replying to the mails I send.
It would appear that the KOM<->email bridge is not working correctly, so the last few replies I have written have not, actually, been sent.
First, on this topic:
Object `+= lfuns are not called when they should have been.
Well. That depends on how you define "should have been.". :)
`+= has never been called except when there is only one reference to the object, which is why it can be somewhat confusing. It does allow optimizations since the value can be modified destructively, however.
And, really, X += Y only implies that the variable X will have Y added to it, it is not the same as whatever X pointing to getting Y added.
(this is much the same as if there had been a `=, assignment overloading)
Consider:
| mapping m = ([]); | object o = 1003443894984389348989434389; | multiset z = (<>); | array q = ({}); | string b = "";
In which cases should using += change not only the variable but also what it points to?
I think this would be somewhat confusing, as an example:
| object a = 18042893489438948390; | object b = 182389983498439843983498; | object c = b;
a += b;
// afterwards c == b, while I would expect (a-b == c) to be true.
As for how the += optimization is supposed to work:
- For one, the object might have an extra instance on the stack, in which case the minimum number of references becomes 2 already.
Well, this should be mostly fixed by the fact that a = a + X; should be handled differently than other additions, the lvalue (the variable being assigned to and read from) is actually cleared before the addition happens.
The code that is supposed to ensure this is the case lives in line 1337 in las.c (nice..).
The non-existence of +=/-= etc as actual opcpdes is done in treeopt.in, but note that even if those rules are removed there is actually no += opcode in the generated code, and there never has been to my knowledge, the conversion was done in docode.c previously however, which led to a lot of code duplication.
- And for another, the object might have multiple instances which all refer to the same object; which *implies* that updating one of those using += should modify all of them.
Well. Not really, in pike. Adding things to a variable only ever change the variable, not the thing it points to.
IOBuffer talk:
The actual reason I beefed up Buffer a bit lately is *because* I need to do some protocol decoding of a byte stream.
Now I see IOBuffer arrive. In order to avoid code bloat, wouldn't it be a better idea to integrate the functionality of IOBuffer into Buffer and just keep one?
Sorry about the timing, I have had IOBuffer on the way for some time (I am still wondering where to put it, however, that has, believe it or not, been a blocker for me. Perhaps Stdio.Buffer? I will create a buffered stream that reads to and writes from said object, without creating pike strings)
Unfortunately it is not possible to make String.Buffer even close to as efficient as long as it uses a stringbuilder. And not using a stringbuilder slows some things down (sprintf comes to mind) and makes others more or less impossible (wide strings) without excessive code duplication.
The whole reason for IOBuffer is that it uses pointer arithmetics on guaranteed 8bit strings to be fast at both reading from the beginning and writing to the end at the same time (I am, by the way, considering converting it to be a circular buffer to avoid the one memmove it does at times), and it is also efficient at creating sub-buffers.
The fact that it is guaranteed to only contain 8bit characters helps a lot too.
Or is the performance difference so strikingly great that this sounds like a bad idea?
As things stands now, yes.
Things like subbuffers is unfortunately actually impossible when using a stringbuilder.
I guess I might outline the plan for IOBuffer some more (I actually did this during the last pike conference, but it has been a while. :))
o Add support for reading to and writing from file objects to it. Either add support in Stdio.File (also add System.Memory + String.Buffer?) or do it the other way around (that way lies madness, however, see also: Shuffler)
The main goal here is to do one copy less and avoid the pike string creation
o Add support for System.Memory & String.Buffer to add()
o Add support for reading other things than "binary holerith" and integers + line + word + json object + encode_value?
o Add support for "throw mode".
It is rather useful to be able to change what happens when you try to read data that is not in the buffer.
| void read_callback() | { | inbuffer->add( new_data ); | // Read next packet | while( Buffer packet = inbuffer->read_hbuffer(2) ) | { | packet->set_error_mode(Buffer.THROW_ERROR); | if( mixed e = catch(handle_packet( packet ) ) ) | if( e->buffer_error ) | protocol_error(); // illegal data in packet | else | throw(e); // the other code did something bad | } | }
The handle_packet function then just assumes the packet is well formed, and reads without checking if the read succeed (since it will, or an error will be thrown).
This code snipplet also demonstrates why the subbuffers are handy, read_hbuffer does not actually copy any data, the returned buffer just keeps a reference to the data in the original buffer.
-- mail #2
I've been tracking IOBuffer extensions back to String.Buffer, I'll present what I have shortly.
I suspect (but benchmarks will have to tell) that the String.Buffer implementation is not significantly slower than the current IOBuffer one (whilst supporting the full range of character widths).
Well. You have some minor optimizations to do:
| int perf(object buffer) | { | buffer->add("11"); | for(int i=0;i<10000; i++ ) | { | int l; | if( buffer->cut ) | { | l = (buffer[0]<<8) | buffer[1]; | buffer->cut(0,2,1); | } | else | l = buffer->read_int(2); | buffer->add(random_string(l)); | return sizeof(buffer); | } | }
perf( String.Buffer() );
Result 2: 325250971 Compilation: 624ns, Execution: 144.86s
perf( Stdio.IOBuffer() );
Result 3: 328787331 Compilation: 639ns, Execution: 194.06ms
(note that the length differs due to random_string)
However, reviewing the IOBuffer interface, I wonder about the following issues:
- Isn't it prudent to drop set_error_mode() and simply implement this functionality (the throw()) using a custom range_error() override?
Well. That would work, yes, I just simply did not remove the old version of throwing errors since it would most often be used using the simply buff->set_error_mode(1) when doing sub-parsing as I showed in the documentation for set_error_mode.
The need to do rather complex sub-classing for that common usecase seemed somewhat pointless.
- Why insist on lock()ing the buffer when subbuffers are active? Couldn't the code figure out by itself when a subbuffer exists and then decide on-demand and automatically when a copy needs to be made to transparently support the desired operation?
Not really, since the subbuffer only contains a pointer directly into the memory area of the main buffer, if the main buffer changes that using realloc or malloc it would be invalid, this could of course be fixed by adding a list of subbuffers to the main buffer, but then you run into issues with refcounting and such. Since the usecase where you have a subbuffer active and want to modify the main buffer is rather uncommon I thought it was OK that you have to call trim() on the subbuffer to do that.
Why not return IOBuffers practically everywhere, and then let the caller decide when and if to cast them to a string? It gets rid of excessive method diversification due to there needing to be a string and a buffer returning one. Returning a buffer is cheap, it doesn't copy the content.
Well, there is about a factor of 3 performance difference:
string perf2(object b) { while( b->read(1) ); } string perf3(object b) { while( b->read_buffer(1) ); }
perf2(Stdio.IOBuffer(mb100));
Result 5: 0 Compilation: 664ns, Execution: 92.19ms
perf3(Stdio.IOBuffer(mb100));
Result 6: 0 Compilation: 680ns, Execution: 173.68ms
Since that does not include the cast, which should be about as fast as the first read, it becomes about 3x slower.
And most of the time you actually want the string version, not the buffer version.
-- mail #3
Erm... We are *in* a Buffer object, so by definition we have one. So returning a readonly-copy with zero-copy effort is easy. It basically delays the creation of the shared string as long as possible.
Not really, we are in /a/ buffer object, not the subsection of it that should be returned. You have to create a new one to return a subsection. read_buffer is about as fast as it gets, it does the minimal amount of work.
As long as one is doing string operations (adding/substracting/matching) Buffer objects are better. Once done with that, the final "result" can/should be a shared string.
Nothing that adds data to the buffer returns a string in the current buffer code.
The only thing that returns a string is if you call read() or read_hstring() on it.
Once done with that, the final "result" can/should be a shared string.
An additional comment: By definition you are almost never actually "done" with a IOBuffer.
They are designed to be input and output buffers for IO.
-- mail #4
An additional comment: By definition you are almost never actually "done" with a IOBuffer.
They are designed to be input and output buffers for IO.
And now the basic support is there to use them for Stdio.File objects.
Stdio.File now has a new nonblocking mode: Buffered I/O
In this mode the file object maintains two buffers, one for input and one for output.
The read callback will get the buffer as the second argument, and data that the user does not read from that buffer is kept until the next time data arrives from the file (this means you do not have to do your own buffering of input)
The output buffer is, unsurprisingly, used to output data from.
This has at least three somewhat convenient effects:
o The write callback will now receive that buffer as a second argument. You just add data to it to write it.
o Adding data to the buffer when /not/ in the write callback will still trigger sending of data if no write callback is pending.
o Your write callback will not be called until the buffer is actually empty.
An extremely small demo:
| void null() {} | | int main() | { | Stdio.IOBuffer output = Stdio.IOBuffer(); | Stdio.File fd = Stdio.File(0); | | fd->set_buffer_mode( 0, output ); | | fd->set_nonblocking( Function.uncurry(output->add), null, null ); | | return -1; | }
This will case all data received on stdio to be eched to .. stdin using buffered output.
Not the most useful application, but it does show how easy it is. Buffered mode in general is mainly useful because it removes the need for you to handle the buffers manually in your code.
Currently read() and write() are not in any way modified by having buffered output enabled, if you interact directly with the file object it will bypass the buffer. I am unsure if this is a good idea or not.
-- Per Hedbor
On Mon, Sep 8, 2014 at 10:38 PM, Stephen R. van den Berg srb@cuci.nl wrote:
Well, for one, someone browsing the docs needs to be explained the difference between http://pike.lysator.liu.se/generated/manual/modref/ex/predef_3A_3A/_backtick... and http://pike.lysator.liu.se/generated/manual/modref/ex/lfun_3A_3A/_backtick_a...
Neither of these is for the += case, though. If you click the link to `+=, you get the info Tobias and I linked to.
ChrisA
Per Hedbor wrote:
Sorry about the timing, I have had IOBuffer on the way for some time (I am still wondering where to put it, however, that has, believe it or not, been a blocker for me. Perhaps Stdio.Buffer? I will create a buffered stream that reads to and writes from said object, without creating pike strings)
Well, before giving a final resting place, consider my additions to String.Buffer, then decide which to squash and where to put it.
Unfortunately it is not possible to make String.Buffer even close to as efficient as long as it uses a stringbuilder. And not using a stringbuilder slows some things down (sprintf comes to mind) and makes others more or less impossible (wide strings) without excessive code duplication.
Well, we do the difficult today, the impossible tomorrow. And considering that I've been tracking your IOBuffer changes almost instantly locally here, I'd say I've already done the impossible a few days ago :-).
The whole reason for IOBuffer is that it uses pointer arithmetics on guaranteed 8bit strings to be fast at both reading from the beginning and writing to the end at the same time (I am, by the way, considering converting it to be a circular buffer to avoid the one memmove it does at times), and it is also efficient at creating sub-buffers.
Well, I'm not sure that a circular buffer will really help, since it introduces buffer-wrap checks. Generally speaking, buffers help enormously when reading and parsing from bytestreams into pike, but they do not help all that much when writing to bytestreams. The writev() system call handles the fragmented writes quite nicely (see the pgsql driver) and more efficient than an IOBuffer which would copy the data at least once.
Things like subbuffers is unfortunately actually impossible when using a stringbuilder.
Well, it's not, really. Unless I made some rather striking mistakes.
The main goal here is to do one copy less and avoid the pike string creation
The reading case needs that, the writing case not so much.
Well. You have some minor optimizations to do:
You're looking at v1.1 of String.Buffer, I already have v3.0.
perf( String.Buffer() );
Result 2: 325250971 Compilation: 624ns, Execution: 144.86s
Erm, how do you get this nice looking timings? Do I have to turn on a profiling flag on Hilfe somewhere?
Generally speaking, buffers help enormously when reading and parsing from bytestreams into pike, but they do not help all that much when writing to bytestreams. The writev() system call handles the fragmented writes quite nicely (see the pgsql driver) and more efficient than an IOBuffer which would copy the data at least once.
That is only true if you do not include the overhead you get from making the strings. It is also only even close to true if you can always write the whole buffer.
The very common case where you have nonblocking output means you always have to chop off data from the strings, if you have them, and keep track of the array members when you use writev.
The actual gain when I switched the communication pipe in one of subprocesses in the storage system we have at Opera to BinaryBuffer, which is basically a predecessor to IOBuffer, when communicating encode_value:ed values in arrays was about 98% speed gain in the buffer handling. It went from 80% to less than 2% of the overall CPU time.
Meanwhile the number of lines needed to do it went from 20 to 10. And that buffer did not have write_to
Basically:
Before, pseudo code of course: array outbuf; output(mixed value) { outbut += ({ sprintf("%4H", encode_value(value)) });}
write_callback() { int num = fd->write(outbuf[..100]); // if 100 is left out this is... rather slow.. for large buffers // remove entries from outbuf until sizeof(outbuf[0]) < num.. // this was of course done in one step outbuf[0] = outbuf[0][num..]; // << 99.99% of the CPU time goes here. }
After: BinaryBuffer outbuf; output: outbuf->add_hstring(encode_value(value),4);
write_callback() { string chunk = outbuf->read(4096); int num = fd->write( outbuf ); outbuf->unread( 4096-num ); }
Things like subbuffers is unfortunately actually impossible when using a stringbuilder.
Well, it's not, really. Unless I made some rather striking mistakes.
How do you add the trailing 0, and how do you avoid the fact that the string_builder_x functions can at any time reallocate the 's' member to point to another memory location, making the sub-object totally invalid, and how do you avoid the fact that the str member is actually _part of_ the struct pike-string structure, so you at least temporarily have to overwrite the data in the main buffer?
perf( String.Buffer() );
Result 2: 325250971 Compilation: 624ns, Execution: 144.86s
Erm, how do you get this nice looking timings? Do I have to turn on a profiling flag on Hilfe somewhere?
set format bench
(did you note the factor of 1k difference in speed in the test?
Per Hedbor wrote:
Generally speaking, buffers help enormously when reading and parsing from bytestreams into pike, but they do not help all that much when writing to bytestreams. The writev() system call handles the fragmented writes quite nicely (see the pgsql driver) and more efficient than an IOBuffer which would copy the data at least once.
That is only true if you do not include the overhead you get from making the strings. It is also only even close to true if you can always write the whole buffer.
The very common case where you have nonblocking output means you always have to chop off data from the strings, if you have them, and keep track of the array members when you use writev.
Ok, valid points. Both methods have their merits, I guess, depending on the data at hand.
Things like subbuffers is unfortunately actually impossible when using a stringbuilder.
Well, it's not, really. Unless I made some rather striking mistakes.
How do you add the trailing 0, and how do you avoid the fact that the string_builder_x functions can at any time reallocate the 's' member to point to another memory location, making the sub-object totally invalid, and how do you avoid the fact that the str member is actually _part of_ the struct pike-string structure, so you at least temporarily have to overwrite the data in the main buffer?
I guess answering these questions is best done by showing the code (it'll become a long story otherwise). I'll run what I have through the checks today, and check it in. It tracked your changes up till halfway last week, I think, so there still is a little to be done. Incidentally the bugs you fixed in the past few days I simply didn't copy in my implementation in the first place ;-).
perf( String.Buffer() );
Result 2: 325250971 Compilation: 624ns, Execution: 144.86s
Erm, how do you get this nice looking timings? Do I have to turn on a profiling flag on Hilfe somewhere?
set format bench
(did you note the factor of 1k difference in speed in the test?
Yes, I noticed, but that difference was already gone more than a week ago in my local version.
Benefits of the new String.Buffer implementation: - Close to drop-in replacement for IOBuffer. - Performance hovers around that of IOBuffer's. - Support all native Pike character widths. - Backward compatible with the old String.Buffer. - Creating a String.Buffer object from a shared string does *not* incur a single copy (and is thus very lightweight). - A String.Buffer initialised from a shared string allows one to access the shared string in parts without copying the data at all. - Supports System.Memory objects just like IOBuffer. - No need to account for locks on subbuffers; String.Buffer resolves those automatically and on demand. - Explicit copying of the buffer data is not supported anymore, the implicit copy-on-demand mechanism is more efficient.
Stephen R. van den Berg wrote:
Things like subbuffers is unfortunately actually impossible when using a stringbuilder.
Well, it's not, really. Unless I made some rather striking mistakes.
Behold, the source code of String.Buffer NG has been checked in.
How do you add the trailing 0, and how do you avoid the fact that the
The trailing \0 is only required for shared strings, I streamlined the string_builder code dealing with that in a separate patch.
string_builder_x functions can at any time reallocate the 's' member to point to another memory location, making the sub-object totally invalid, and how do you avoid the fact that the str member is actually _part of_ the struct pike-string structure, so you at least temporarily have to overwrite the data in the main buffer?
This is solved using refcounts and copying the stringbuffer as soon as we want to modify one that has more than one ref.
With regard to benchmarks and speed. I'd say String.Buffer now rivals and possibly sometimes exceeds the performance of IOBuffer. Case in point:
int perf(object buffer) { buffer->add("11"); for(int i=0;i<100000; i++ ) { int l; if( buffer->cut ) { l = (buffer[0]<<8) | buffer[1]; buffer->cut(0,2,1); } else l = buffer->read_int(2); buffer->add(random_string(l)); } return sizeof(buffer); }
perf( String.Buffer() );
Result 37: 3270303901 Compilation: 936ns, Execution: 5.72s
perf( Stdio.IOBuffer() );
Result 35: 3274269166 Compilation: 941ns, Execution: 5.60s
But, String.Buffer supports close to all methods of IOBuffer, so a fairer test would be:
int perf(object buffer) { buffer->add("11"); for(int i=0;i<100000; i++ ) { int l; l = buffer->read_int(2); buffer->add(random_string(l)); } return sizeof(buffer); }
perf( String.Buffer() );
Result 44: 3271319355 Compilation: 949ns, Execution: 5.69s
perf( Stdio.IOBuffer() );
Result 41: 3278225479 Compilation: 942ns, Execution: 5.59s
Looks close enough. One should compare program-code-size too, I didn't bother to do that yet because it is part of builtin.cmod. I'd wager that my String.Buffer implementation has a smaller code footprint than the current IOBuffer.
Comments, criticism, patches are welcome, of course.
Stephen R. van den Berg wrote:
Benefits of the new String.Buffer implementation:
- Close to drop-in replacement for IOBuffer.
- Performance hovers around that of IOBuffer's.
- Support all native Pike character widths.
- Backward compatible with the old String.Buffer.
- Creating a String.Buffer object from a shared string does *not* incur
a single copy (and is thus very lightweight).
- A String.Buffer initialised from a shared string allows one to
access the shared string in parts without copying the data at all.
- Supports System.Memory objects just like IOBuffer.
- No need to account for locks on subbuffers; String.Buffer resolves
those automatically and on demand.
- Explicit copying of the buffer data is not supported anymore, the
implicit copy-on-demand mechanism is more efficient.
I would almost forget: - Supports unread(string|Buffer str) which allows one to push back strings at the front of the buffer.
Proposed change:
Change the Buffer add_hstring( string|Buffer|object str, int size_size ) method into Buffer add_hstring( int size_size, string|object|int|array(string|object|int) ... str )
First, I am fairly sure you already know my test program was only made to check the ordo, not the constant (almost all of the constant in that test is from random_string, actually).
And, yes, they are now very similar in speed if you add strings to them, which they already were (string.buffer is about 20% slower, it is very possible to optimize IOBuffer, of course, but the same can be done for Buffer. Most of the overhead lives in the function call and argument checking as is.).
String.Buffer is still about a factor of two off, more in most cases where the new code is really useful (add_int(s)/short etc, same with extracting them)
You can fix most of the very slow functions, but getting better than about 50% performance is rather hard when you have to check the size-shift when extracting and adding chars. The cpp() function became about 40% slower after my rewrite that changed it to use PCHARP everywhere, as an example.
Also, the fact that String.Buffer can contain non-8bit characters is directly /harmful/ when doing an IOBuffer. So the extra overhead is not really there for any good reason.
Why not simply keep String.Buffer as a simple string-adding-buffer, and let the more complex stuff be done in other code, as before? IOBuffer is really only a more efficient version of ADT.struct. Why is it so dangerous to have it around? I find the additions to String.Buffer to be significantly less beautiful, if that is a way to put it. There was not really any shared funcitonality between String.Buffer and Stdio.IOBuffer except one before: Adding strings to a buffer that is not otherwise used until "done".
And since IOBuffers are never "done" usually that is not really all that much of a similarity. Why try to mangle String.Buffer into a IOBuffer, the only thing they have in common is the name 'Buffer' which in the String.Buffer case should have been Builder anyway.
It is somewhat esoteric to use add_int(..) add_short() etc when the string contains widechars. It does not really make sense (incidentally, your JSON parser assumes utf8 which is hardly true for a string, those are supposed to be unicode)
Also, how do you handle the 2G+ buffer case? I can see no code for that (yes, I have hit 2G+ buffer sizes, which is why IOBuffer is very carefully using size_t everywhere).
Anyway, test program 2:
void main() { string ta = "data"*400; array(int) numbers = enumerate(6553);
Stdio.IOBuffer a = Stdio.IOBuffer(); String.Buffer b = String.Buffer();
System.Timer t = System.Timer(); for(int i=0; i<100000; i++ ) a->add(ta)->add_ints(numbers,10)->read_buffer(120000);
werror("IOBuffer: %.1f\n", t->get());
for(int i=0; i<100000; i++ ) { b->add(ta);b->add_ints(numbers,10)->read_buffer(120000); } werror("String.Buffer: %.1f\n", t->get()); }
gives you something along the lines of
IOBuffer: 5.3 String.Buffer: 39.6
And as for codesize, String.Buffer, not counting any stringbuilder code, compiled using clang -Ofast on my mac, weighs in at 27893 bytes (only counting functions named _Buffer_* in builtin.o, so it does not include the helper functions).
The text segment of buffer.o is 23425. In total. Without using stringbuilder. So, no, not really.
They are almost identical when it comes to the number of lines (again, not counting stringbuiler, pcharp functions etc):
String.Buffer is about 1870 lines long (docs included, but those are rather similar since it's mostly copied, in fact IOBuffer has five more lines of docs, even though String.Buffer have more funcitons (the comparison ones, again something which is pointless in an IOBuffer, incidentally)).
buffer.cmod is almost exactly 2000 lines long. not counting #include lines.
But String.Buffer does not have the correct code for things like errno and stdio nonblocking modes, which will add at least 20 lines of code on it's own, and also has no support for IOBuffer as input (also, IOBuffer no longer correctly supports String.Buffer due to the changes in String.Buffer, the same is of course true for all other code using String.Buffer such as the shuffler).
Also, one thing at random I noticed when reading the stralloc.c commits:
- static INLINE int min_magnitude(const p_wchar2 c) + static INLINE unsigned min_magnitude(const unsigned c) { - if(c<0) return 2; - if(c<256) return 0; - if(c<65536) return 1; - return 2; + return c<256 ? 0 : c<65536 ? 1 : 2; }
is not really much of a speedup, if any, in fact it reorganized the instructions for me to make it less efficient if c>=0 and <= 255 when using clang (you would need to add some LIKELY to obvious locations). YMMV when the code is included in the location it's called and other compilers are used. Which is sort of the point.
Also, it is generally a good idea to add what is unsigned, we do so every where else. And, pike 32bit chars are actually signed.
Another random thing: _encode should probably not empty the buffer. It's a fairly unexpected side-effect.
Benefits of the new String.Buffer implementation:
- Close to drop-in replacement for IOBuffer.
Yes, but slower, and why?
- Performance hovers around that of IOBuffer's.
Well, not really, but granted, for simple stuff it's comparable, and unless you modify the buffer when subbuffers are live the ordos should be the same (but I might modify IOBuffer to be circular in the simple case, since that is rather a lot faster unless the buffer size is extended all the time (and it is usually not in a long-lived connection, most packets/whatever tend to have similar sizes) This is surprisingly trivial since most of the things use the io_* abstraction layer)
- Support all native Pike character widths.
That is not a feature as far as IOBuffers are concerned. In fact it's rather the reverse.
- Backward compatible with the old String.Buffer.
I really hope so, since it is String.Buffer... I never said IOBuffer should replace String.Buffer.
- Creating a String.Buffer object from a shared string does *not* incur
a single copy (and is thus very lightweight).
Of course the same is true for IOBuffer.
- A String.Buffer initialised from a shared string allows one to
access the shared string in parts without copying the data at all.
Of course the same is true for IOBuffer. IOBuffer does not even do any copying when you read all data and then later on add a new string. Which String.Buffer does. :)
- No need to account for locks on subbuffers; String.Buffer resolves
those automatically and on demand.
- Explicit copying of the buffer data is not supported anymore, the
implicit copy-on-demand mechanism is more efficient.
Well, except that the explicit copy is mainly there to make it obvious when you accidentally copy. It was intentional, so this is a drawback.
Also, am I right in the fact that the whole buffer has to be copied if a subbuffer is live while the main buffer is modified?
That seems to be set up for rather severe ordo-accidents that will not be detected. Common case:
you have a packet based protocol. You extract a packet as a buffer. You then do a call_out or similar to handle the packet (thread, whatever).
Then new data arrives, and is added to the buffer.
In the String.Buffer case this might copy the buffer, which is never what you want.
Please note that it would be trivially easy to do the same in IOBuffer, just modify the code in io_ensure_unlocked to copy and then unlock, I /intentionally/ did not do so, even when it was suggested here, since it is, in my opinion, much better if you are required to explicitly uncouple the subbuffer, which adds a simple call in one place, than to accidentally copy the whole buffer, which will add extreme overhead for no good reason that will be very hard to find.
Per Hedbor wrote:
You can fix most of the very slow functions, but getting better than about 50% performance is rather hard when you have to check the size-shift when extracting and adding chars. The cpp() function became about 40% slower after my rewrite that changed it to use PCHARP everywhere, as an example.
Ok, can't argue with that.
Also, the fact that String.Buffer can contain non-8bit characters is directly /harmful/ when doing an IOBuffer. So the extra overhead is not really there for any good reason.
Well, I wouldn't say harmful, non-obvious perhaps. The system calls ultimately do the 8-bit check. Don't get me wrong: I come from an 8-bit world, I just learned to live with the fact that in Pike you have to deal with more, usually.
Why not simply keep String.Buffer as a simple string-adding-buffer, and let the more complex stuff be done in other code, as before? IOBuffer is really only a more efficient version of ADT.struct. Why is it so dangerous to have it around? I find the additions to String.Buffer to be significantly less beautiful, if that is a way to put it. There was not really any shared funcitonality between String.Buffer and Stdio.IOBuffer except one before: Adding strings to a buffer that is not otherwise used until "done".
Well, ok, but then I suggest we dumb down String.Buffer again to what it was in 7.8, and get rid of the addat() as well. Since the addat() use case becomes rather narrow considering the existence of IOBuffer. Or am I missing something as to its usefulness in String.Buffer?
It is somewhat esoteric to use add_int(..) add_short() etc when the string contains widechars.
Correct :-). I did have a slightly tingly feeling when considering what the semantics were supposed to be in that case.
It does not really make sense (incidentally, your JSON parser assumes utf8 which is hardly true for a string, those are supposed to be unicode)
The JSON parser is only useful if unleashed on 8-bit strings, of course.
Also, how do you handle the 2G+ buffer case? I can see no code for that (yes, I have hit 2G+ buffer sizes, which is why IOBuffer is very carefully using size_t everywhere).
Indeed. Didn't want to fiddle with that until we decide if IOBuffer is dropped or not.
And as for codesize, String.Buffer, not counting any stringbuilder code, compiled using clang -Ofast on my mac, weighs in at 27893 bytes (only counting functions named _Buffer_* in builtin.o, so it does not include the helper functions).
The text segment of buffer.o is 23425. In total. Without using stringbuilder. So, no, not really.
Ok, I'm suprised, but it only shows that overhead creeps in with wide character treatment.
Also, one thing at random I noticed when reading the stralloc.c commits:
- static INLINE int min_magnitude(const p_wchar2 c)
- static INLINE unsigned min_magnitude(const unsigned c) {
- if(c<0) return 2;
- if(c<256) return 0;
- if(c<65536) return 1;
- return 2;
- return c<256 ? 0 : c<65536 ? 1 : 2;
is not really much of a speedup, if any, in fact it reorganized the instructions for me to make it less efficient if c>=0 and <= 255 when using clang (you would need to add some LIKELY to obvious locations). YMMV when the code is included in the location it's called and other compilers are used. Which is sort of the point.
I would have expected the compiler to do the unsigned comparison first and therefore help the common case. I wasn't familiar with the (UN)LIKELY macros; seems like they could help, indeed.
Also, it is generally a good idea to add what is unsigned, we do so every where else. And, pike 32bit chars are actually signed.
You mean add a comment about the forced unsigned comparison?
Another random thing: _encode should probably not empty the buffer. It's a fairly unexpected side-effect.
I believe I copied this from IOBuffer. Maybe I messed it up.
Benefits of the new String.Buffer implementation:
- Close to drop-in replacement for IOBuffer.
Yes, but slower, and why?
Well, I'd hoped that the speed difference would only be a few % for the 8-bit case. It turns out that it is not. If it would have been, it would have increased cache-locality by reducing the code footprint of Pike.
- A String.Buffer initialised from a shared string allows one to
access the shared string in parts without copying the data at all.
Of course the same is true for IOBuffer. IOBuffer does not even do any copying when you read all data and then later on add a new string. Which String.Buffer does. :)
Yes, well, that was on the todo short-list. But probably not worth fixing if we're downsizing String.Buffer again.
- No need to account for locks on subbuffers; String.Buffer resolves
those automatically and on demand.
- Explicit copying of the buffer data is not supported anymore, the
implicit copy-on-demand mechanism is more efficient.
Well, except that the explicit copy is mainly there to make it obvious when you accidentally copy. It was intentional, so this is a drawback.
In String.Buffer it would have been easy to add a performance-check-aid which throws exceptions if buffer content is being copied. I'd actually consider it better design to change IOBuffer to something similar; i.e. automatically copy, but allow an explicitly locked-down buffer which resists automatic copying by throwing exceptions. The idea being that code should "just work", and if you're concerned about performance (because you know the buffers can be big), you lock it down to prevent accidents (usually during development).
Also, am I right in the fact that the whole buffer has to be copied if a subbuffer is live while the main buffer is modified?
Correct. That would have resulted in an exception in the IOBuffer case, my code handles it automagically. I'd prefer the programmer ask for protection instead of enforcing it.
That seems to be set up for rather severe ordo-accidents that will not be detected. Common case:
you have a packet based protocol. You extract a packet as a buffer. You then do a call_out or similar to handle the packet (thread, whatever).
Then new data arrives, and is added to the buffer.
In the String.Buffer case this might copy the buffer, which is never what you want.
Quite. But, consider the alternative with IOBuffer: depending on race conditions and timing, you might not notice the problem for a long time and then it bites you throwing a "locked" exception when the programmer is not around.
Please note that it would be trivially easy to do the same in IOBuffer, just modify the code in io_ensure_unlocked to copy and then unlock, I /intentionally/ did not do so, even when it was suggested here, since it is, in my opinion, much better if you are required to explicitly uncouple the subbuffer, which adds a simple call in one place, than to accidentally copy the whole buffer, which will add extreme overhead for no good reason that will be very hard to find.
:-). I argue the exact opposite, because the tradeoff here ultimately is that a. Unexpected performance problems creep up. b. Unexpected "locked" exceptions creep up due to race conditions.
Given the choice, I prefer a over b anytime. Once I have a, I would "lock down explicitly" and see where it throws, but then it's under programmer control, as opposed to in production and the programmer is not around.
Stephen R. van den Berg wrote:
Well, ok, but then I suggest we dumb down String.Buffer again to what it was in 7.8, and get rid of the addat() as well. Since the addat() use case becomes rather narrow considering the existence of IOBuffer. Or am I missing something as to its usefulness in String.Buffer?
Reviewing the functionality, I'd say to keep clear() and sprintf() to String.Buffer, but drop addat() ?
Considering it was added in 2011, I guess its actually in use. I think when removing APIs/features there need to be compelling reasons to do so. The mere fact that addat has narrow use cases (is that even true?) is imho not enough.
On 09/10/14 14:00, Stephen R. van den Berg wrote:
Stephen R. van den Berg wrote:
Well, ok, but then I suggest we dumb down String.Buffer again to what it was in 7.8, and get rid of the addat() as well. Since the addat() use case becomes rather narrow considering the existence of IOBuffer. Or am I missing something as to its usefulness in String.Buffer?
Reviewing the functionality, I'd say to keep clear() and sprintf() to String.Buffer, but drop addat() ?
10 sep 2014 kl. 14:23 skrev Arne Goedeke el@laramies.com:
Considering it was added in 2011, I guess its actually in use. I think when removing APIs/features there need to be compelling reasons to do so. The mere fact that addat has narrow use cases (is that even true?) is imho not enough.
Well, in some way that is not entirely true: We do try to remove functions that are a bit too narrow in scope.
I have never seen the use case for addat personally. And I did not include it in IOBuffer even though it would be fairly simple. Also, the function name indicates that the data should be added at a specific location, not overwriting anything.
— Per Hedbor
Arne Goedeke wrote:
Considering it was added in 2011, I guess its actually in use. I think when removing APIs/features there need to be compelling reasons to do so. The mere fact that addat has narrow use cases (is that even true?) is imho not enough.
It has never been in 7.8. I do agree that the naming is a bit ambiguous (it overwrites and extends). Grubba added it, maybe he can explain what he used/uses it for?
On 09/10/14 14:27, Per Hedbor wrote:
10 sep 2014 kl. 14:23 skrev Arne Goedeke el@laramies.com:
Considering it was added in 2011, I guess its actually in use. I think when removing APIs/features there need to be compelling reasons to do so. The mere fact that addat has narrow use cases (is that even true?) is imho not enough.
Well, in some way that is not entirely true: We do try to remove functions that are a bit too narrow in scope.
I have never seen the use case for addat personally. And I did not include it in IOBuffer even though it would be fairly simple. Also, the function name indicates that the data should be added at a specific location, not overwriting anything.
Usually its being deprecated for some time, so people have a chance to fix their code. addat() was never in a stable release, so I guess that doesn't matter..
grubba added it, so I guess roxen uses it somewhere?
Arne Goedeke wrote:
Usually its being deprecated for some time, so people have a chance to fix their code. addat() was never in a stable release, so I guess that doesn't matter..
grubba added it, so I guess roxen uses it somewhere?
A cursory check of the core Roxen 5.4 version doesn't show it anywhere. Maybe it's in the more recent version or in some parts which are not public.
Is the mail gateway still working? (Just in case I'm missing responses from Grubba). I have a revert-cleanup patch waiting here at the trigger, but I didn't want to check it in until I had an answer on this issue.
I think there is a couple of features that would be nice to have in IOBuffer. If there is no objections I would go ahead and add them.
1) support for reading/writing signed integers. Not sure about the api, maybe the current methods should could be renamed read_uint/add_uint instead? The -1 return value also would not work for the read_ methods. The add_int variants actually work fine for signed ints already, due to the cast to unsigned. sprintf()/sscanf() could be used currently but are quite slow.
2) read_utf8/add_utf8. its not hard to do, quite common and would save one string creation and copy.
3) reading/writing ieee floats. also handled by sprintf/sscanf, but on most systems would reduce to bswap+memcpy
4) little endian support? Admittedly the use cases are narrow for network protocols, but not so much for file formats.
arne
On Thu, Sep 11, 2014 at 10:58 AM, Arne Goedeke el@laramies.com wrote:
I think there is a couple of features that would be nice to have in IOBuffer. If there is no objections I would go ahead and add them.
- support for reading/writing signed integers. Not sure about the api,
maybe the current methods should could be renamed read_uint/add_uint instead? The -1 return value also would not work for the read_ methods. The add_int variants actually work fine for signed ints already, due to the cast to unsigned. sprintf()/sscanf() could be used currently but are quite slow.
The returns value is an issue, I guess using UNDEFINED makes sense.
In my "normal" usage I only use the return value for read_buffer and friends however, and then set error mode and just read.
It really simplifies not having to check after each read, I just added the -1 return value since it sort of makes sense. But UNDEFINED should work just as well, really.
And renaming the current ones to read_u* makes sense if we add non-unsigned versions, yes.
And add_int only sort of works, it will break for bignums, and also normal integers if the width > sizeof(INT_TYPE) (the code padds will null characters).
- read_utf8/add_utf8. its not hard to do, quite common and would save
one string creation and copy.
It would be nice to avoid large amounts of code duplication, however.
- reading/writing ieee floats. also handled by sprintf/sscanf, but on
most systems would reduce to bswap+memcpy
That might be useful, yes. But is it really _that_ common? I guess it makes since to be complete(ish). :)
- little endian support? Admittedly the use cases are narrow for
network protocols, but not so much for file formats.
I avoided that since my point of view when initially implementing the buffer was that it was mainly going to be used for network I/O. But it does work for normal files as well, so it might indeed be useful.
arne
On Thu, 11 Sep 2014, Per Hedbor wrote:
On Thu, Sep 11, 2014 at 10:58 AM, Arne Goedeke el@laramies.com wrote:
I think there is a couple of features that would be nice to have in IOBuffer. If there is no objections I would go ahead and add them.
- support for reading/writing signed integers. Not sure about the api,
maybe the current methods should could be renamed read_uint/add_uint instead? The -1 return value also would not work for the read_ methods. The add_int variants actually work fine for signed ints already, due to the cast to unsigned. sprintf()/sscanf() could be used currently but are quite slow.
The returns value is an issue, I guess using UNDEFINED makes sense.
In my "normal" usage I only use the return value for read_buffer and friends however, and then set error mode and just read.
It really simplifies not having to check after each read, I just added the -1 return value since it sort of makes sense. But UNDEFINED should work just as well, really.
And renaming the current ones to read_u* makes sense if we add non-unsigned versions, yes.
And add_int only sort of works, it will break for bignums, and also normal integers if the width > sizeof(INT_TYPE) (the code padds will null characters).
Yes, I noticed that digits() doesnt like negative numbers, when playing with it. Sscanf/sprintf already have code internally for handling all this stuff, maybe we can move that to a common c-api.
- read_utf8/add_utf8. its not hard to do, quite common and would save
one string creation and copy.
It would be nice to avoid large amounts of code duplication, however.
I would split the current utf8 encoder/decoder into 2 functions, one for calculating the length of the result and one doing the actual encoding/decoding. f_utf8_to_string could then call those two directly, as could IOBuffer. Sounds ok?
- reading/writing ieee floats. also handled by sprintf/sscanf, but on
most systems would reduce to bswap+memcpy
That might be useful, yes. But is it really _that_ common? I guess it makes since to be complete(ish). :)
Well, its not so very common, but it could also use the same code as sprintf/sscanf does, so it would not add too much code.
- little endian support? Admittedly the use cases are narrow for
network protocols, but not so much for file formats.
I avoided that since my point of view when initially implementing the buffer was that it was mainly going to be used for network I/O. But it does work for normal files as well, so it might indeed be useful.
It blows the number of functions up a bit, so maybe its not worth it.
arne
On Thu, Sep 11, 2014 at 3:45 PM, Arne Goedeke el@laramies.com wrote:
I would split the current utf8 encoder/decoder into 2 functions, one for calculating the length of the result and one doing the actual encoding/decoding. f_utf8_to_string could then call those two directly, as could IOBuffer. Sounds ok?
Well, yes, as long as it is not slower (and, perhaps most importantly, does not create new strings in the somewhat common case where there is no change when doing the encoding/decoding)
The mail gateway is not working, no.
By the way, I think it might still be useful to allow the addition of buffers (and shm/iobuffer?) objects to String.Buffer.
Perhaps we should make a common function that can be used to get len/char* from a given "memory chunk" object.
Per Hedbor wrote:
The mail gateway is not working, no.
Feedback loop over the Pike git checkins, great.
By the way, I think it might still be useful to allow the addition of buffers (and shm/iobuffer?) objects to String.Buffer.
Agreed.
Perhaps we should make a common function that can be used to get len/char* from a given "memory chunk" object.
I think that would be a good idea, since the N-way mesh starts to get unwieldy.
Arne Goedeke wrote:
- little endian support? Admittedly the use cases are narrow for
network protocols, but not so much for file formats.
I avoided that since my point of view when initially implementing the buffer was that it was mainly going to be used for network I/O. But it does work for normal files as well, so it might indeed be useful.
It blows the number of functions up a bit, so maybe its not worth it.
What about Per's original idea of designating little endian ints by a negative width argument? I supported some of that in the String.Buffer version I made. It allows little endian support everywhere without creating extra functions.
In trying to use IOBuffer in the pgsql driver I notice that I'm missing a read-a-\0-terminated string function from the buffer.
I could add this functionality as part of read_hstring(). The two obvious choices are: a. Perform it when called as read_hstring(0); b. Perform it when called as read_hstring(); c. None of the above.
Any preferences?
Arne Goedeke wrote:
- support for reading/writing signed integers. Not sure about the api,
maybe the current methods should could be renamed read_uint/add_uint instead? The -1 return value also would not work for the read_ methods. The add_int variants actually work fine for signed ints already, due to the cast to unsigned. sprintf()/sscanf() could be used currently but are quite slow.
Why not simply let the read_int*() functions return all signed values instead? This would: - Increase the chance of the return value fitting into the native int type. - Still allow easy reading of unsigned values (if one insists) by masking the output at the reader end by the appriopriate amount of bits (which is more efficient in Pike than creating a signed from an unsigned).
It should be a new functon. read_hstring reads a hollerith-style (or pascal style) string. That is, a length followed by that many bytes, incidentally the reverse of add_hstring.
A function that reads to the next null character is totally different. Also, read_hstring(0) intentionally returns an empty string, this because it makes most sense (read 0 bytes length (will be 0), then that many characters (0)).
So, it has to be a new function (match works, incidentally, but might be somewhat slow of course).
On Sat, Sep 13, 2014 at 12:43 PM, Stephen R. van den Berg srb@cuci.nl wrote:
Stephen R. van den Berg wrote:
a. Perform it when called as read_hstring(0);
After reviewing the options, choice a seems most friendly to me.
Stephen.
Why not simply let the read_int*() functions return all signed values instead?
Because signed values are rather rare in protocols and file formats.
This would:
- Increase the chance of the return value fitting into the native int type.'
Well, sometimes.
- Still allow easy reading of unsigned values (if one insists) by masking the output at the reader end by the appriopriate amount of bits (which is more efficient in Pike than creating a signed from an unsigned).
Not with bignums, actually. And that's the most common use with SSL, as an example. Also, it would mean that each and every time you read a byte or short or whatever you also have to have a matching &.
It would also break hstring reading rather badly, so you would still need the functions to read unsigned numbers internally.
Also, with native integers, as it turns out, it's fairly equal on x86_64 at least, where the integer operations are inlined.
Just add another function to read signed integers.
And as for byteorder, just add a switch to switch byteorder if the number of functions needs to be kept down for some reason.
Per Hedbor wrote:
It would also break hstring reading rather badly, so you would still need the functions to read unsigned numbers internally.
Also, with native integers, as it turns out, it's fairly equal on x86_64 at least, where the integer operations are inlined.
Just add another function to read signed integers.
Shall we rename the read_int* to read_uint* then?
And as for byteorder, just add a switch to switch byteorder if the number of functions needs to be kept down for some reason.
Well, since IOBuffer is for efficiency and not for beauty, I guess it does not matter a lot if there are few or a lot of functions. Whatever is most efficient counts here, I'd say.
Well, I admit that the check_avail_mul originally did a better job of figuring out if the specified sizes and numbers exceed the available ptrdiff_t type.
However, all the other changes should be correct and should not introduce buffer overruns. Can you give one counterexample? Because, if I recall correctly, in checking the code I found one error in the range checks in the existing code; which got implicitly fixed by the new code.
io_read_number no longer checks the buffer size. If use io_read_number_uc for that..
And then there is the code size: Using loops always duplicates the code inside the loop a few times, that was the major reason for the gotos for something that happens very seldom.
Second, io_avail very explicitly allowed negative numbers to be passed, removing all need to check for that in the functions calling it. It seems totally pointless to remove that support for no reason that I can see?
Third: You removed abstractions. Never do that, the intent of the io_* functions is that the implementation can be changed (as an example, always prefer io_len to io->len-io->offset).
Fourth: I got a bunch of warnings when compiling the code. That is not really a good thing, usually.
Fifth: In unread io_avail_rewind will be called with negative numbers if the function is passed negative numbers. That could cause havok. io_avail_throw does not, actually, throw in the general case. There was a reason it always was combined with something else.
Sixth: In read_hstring io_read_number can return negative numbers, especially on 32bit systems. You still has to check for that. It returns a LONGEST, which might be a 32bit int. So read_hstring( len>=sizeof(LONGEST) ) can easily break currently, causing a core dump.
There are more issues, but those are mainly related to making the code more complicated for no reason. Always trust the compiler to optimize simple expressions unless proven otherwise.
And, where, exactly, was this error in range checking?
Also, somewhat related the "speeds up short path" you just commited actually most likely does not. :)
Check the generated code, it's surprisingly similar for both before and after.
The inner loop is the same length, the only difference is the registers used, and the overall function size differs by 2 instructions, one of which is in the error handling that is rather unlikely to be triggered.
Again, the C compiler is good at obvious optimizations, so most of the time I go for the most easily readable verision of the code.
This change is otherwise fine, I guess, but the commit message is misleading.
I can really recommend gcc.godbolt.org if you want to see what compilers does with stuff, actually.
On Sun, Sep 14, 2014 at 4:13 PM, Per Hedbor per@hedbor.org wrote:
Also, somewhat related the "speeds up short path" you just commited actually most likely does not. :)
Check the generated code, it's surprisingly similar for both before and after.
The inner loop is the same length, the only difference is the registers used, and the overall function size differs by 2 instructions, one of which is in the error handling that is rather unlikely to be triggered.
Again, the C compiler is good at obvious optimizations, so most of the time I go for the most easily readable verision of the code.
This change is otherwise fine, I guess, but the commit message is misleading.
So I guess having read_signed_int(int bytes) would be enough for now. Adding variants for specific lengths probably doesnt make a huge difference, as most of the runtime is spent in mega_apply and friends, especially when consuming just a couple of bytes.
I like the proposal for using negative lengths for little endian, so I would add that to read_int(int bytes) and read_signed_int(int bytes). I think we dont need explicit fixed length versions.
I personally prefer read_uint vs read_int as it avoids some possible confusion, but having read_signed_int would make it clear, too.
I am traveling right now, so not sure when I will be pushing a branch for those things...
arne
On 09/13/14 21:45, Stephen R. van den Berg wrote:
Per Hedbor wrote:
It would also break hstring reading rather badly, so you would still need the functions to read unsigned numbers internally.
Also, with native integers, as it turns out, it's fairly equal on x86_64 at least, where the integer operations are inlined.
Just add another function to read signed integers.
Shall we rename the read_int* to read_uint* then?
And as for byteorder, just add a switch to switch byteorder if the number of functions needs to be kept down for some reason.
Well, since IOBuffer is for efficiency and not for beauty, I guess it does not matter a lot if there are few or a lot of functions. Whatever is most efficient counts here, I'd say.
Object `+= lfuns are not called when they should have been.
Well. That depends on how you define "should have been.". :)
`+= has never been called except when there is only one reference to the object, which is why it can be somewhat confusing. It does allow optimizations since the value can be modified destructively, however.
And, really, X += Y only implies that the variable X will have Y added to it, it is not the same as whatever X pointing to getting Y added.
(this is much the same as if there had been a `=, assignment overloading)
Consider:
| mapping m = ([]); | object o = 1003443894984389348989434389; | multiset z = (<>); | array q = ({}); | string b = "";
In which cases should using += change not only the variable but also what it points to?
I think this would be somewhat confusing, as an example:
| object a = 18042893489438948390; | object b = 182389983498439843983498; | object c = b;
a += b;
// afterwards c == b, while I would expect (a-b == c) to be true.
As for how the += optimization is supposed to work:
- For one, the object might have an extra instance on the stack, in which case the minimum number of references becomes 2 already.
Well, this should be mostly fixed by the fact that a = a + X; should be handled differently than other additions, the lvalue (the variable being assigned to and read from) is actually cleared before the addition happens.
The code that is supposed to ensure this is the case lives in line 1337 in las.c (nice..).
The non-existence of +=/-= etc as actual opcpdes is done in treeopt.in, but note that even if those rules are removed there is actually no += opcode in the generated code, and there never has been to my knowledge, the conversion was done in docode.c previously however, which led to a lot of code duplication.
- And for another, the object might have multiple instances which all refer to the same object; which *implies* that updating one of those using += should modify all of them.
Well. Not really, in pike. Adding things to a variable only ever change the variable, not the thing it points to.
I'd rather have more methods. If I do buf->read_something(a-b); and there is a bug in how a or b was made, I want an exception here, not a wrong and possibly non-failing value out of the read call.
I would prefer to just have read_string() or perhaps something where the delimiter is user defined. read_to('\n');
And we really need to address the mail export. I don't even know the address of the mailing list...
read_hstring() without an argument would make sense for something like a Hollerith string with a base-128-encoded length though. Don't know if that's a common enough case...
If read_hstring() reads Hollerith strings, I'd like read_cstring() for C strings.
pike-devel@lists.lysator.liu.se