Re: String.Buffer NG, compare IOBuffer

10 Sep 2014

      First, I am fairly sure you already know my test program was only made
to check the ordo, not the constant (almost all of the constant in
that test is from random_string, actually).
And, yes, they are now very similar in speed if you add strings to
them, which they already were (string.buffer is about 20% slower, it
is very possible to optimize IOBuffer, of course, but the same can be
done for Buffer. Most of the overhead lives in the function call and
argument checking as is.).
String.Buffer is still about a factor of two off, more in most cases
where the new code is really useful (add_int(s)/short etc, same with
extracting them)
You can fix most of the very slow functions, but getting better than
about 50% performance is rather hard when you have to check the
size-shift when extracting and adding chars. The cpp() function became
about 40% slower after my rewrite that changed it to use PCHARP
everywhere, as an example.
Also, the fact that String.Buffer can contain non-8bit characters is
directly /harmful/ when doing an IOBuffer. So the extra overhead is
not really there for any good reason.
Why not simply keep String.Buffer as a simple string-adding-buffer,
and let the more complex stuff be done in other code, as before?
IOBuffer is really only a more efficient version of ADT.struct. Why is
it so dangerous to have it around? I find the additions to
String.Buffer to be significantly less beautiful, if that is a way to
put it. There was not really any shared funcitonality between
String.Buffer and Stdio.IOBuffer except one before: Adding strings to
a buffer that is not otherwise used until "done".
And since IOBuffers are never "done" usually that is not really all
that much of a similarity. Why try to mangle String.Buffer into a
IOBuffer, the only thing they have in common is the name 'Buffer'
which in the String.Buffer case should have been Builder anyway.
It is  somewhat esoteric to use add_int(..) add_short() etc when the
string contains widechars. It does not really make sense
(incidentally, your JSON parser assumes utf8 which is hardly true for
a string, those are supposed to be  unicode)
Also, how do you handle the 2G+ buffer case? I can see no code for
that (yes, I have hit 2G+ buffer sizes, which is why IOBuffer is very
carefully using size_t everywhere).
Anyway, test program 2:
void main()
{
  string ta = "data"*400;
  array(int) numbers = enumerate(6553);
Stdio.IOBuffer a = Stdio.IOBuffer();
  String.Buffer b = String.Buffer();
System.Timer t = System.Timer();
  for(int i=0; i<100000; i++ )
    a->add(ta)->add_ints(numbers,10)->read_buffer(120000);
werror("IOBuffer: %.1f\n", t->get());
for(int i=0; i<100000; i++ )
  {
    b->add(ta);b->add_ints(numbers,10)->read_buffer(120000);
  }
  werror("String.Buffer: %.1f\n", t->get());
}
gives you something along the lines of
IOBuffer: 5.3
String.Buffer: 39.6
And as for codesize, String.Buffer, not counting any stringbuilder
code, compiled using clang -Ofast on my mac, weighs in at 27893 bytes
(only counting functions named _Buffer_* in builtin.o, so it does not
include the helper functions).
The text segment of buffer.o is 23425. In total. Without using
stringbuilder. So, no, not really.
They are almost identical when it comes to the number of lines (again,
not counting stringbuiler, pcharp functions etc):
String.Buffer is about 1870 lines long (docs included, but those are
rather similar since it's mostly copied, in fact IOBuffer has five
more lines of docs, even though String.Buffer have more funcitons (the
comparison ones, again something which is pointless in an IOBuffer,
incidentally)).
buffer.cmod is almost exactly 2000 lines long. not counting #include lines.
But String.Buffer does not have the correct code for things like errno
and stdio nonblocking modes, which will add at least 20 lines of code
on it's own, and also has no support for IOBuffer as input (also,
IOBuffer no longer correctly supports String.Buffer due to the changes
in String.Buffer, the same is of course true for all other code using
String.Buffer such as the shuffler).
Also, one thing at random I noticed when reading the stralloc.c commits:
- static INLINE int min_magnitude(const p_wchar2 c)
+ static INLINE unsigned min_magnitude(const unsigned c)   {
-  if(c<0) return 2;
-  if(c<256) return 0;
-  if(c<65536) return 1;
-  return 2;
+  return c<256 ? 0 : c<65536 ? 1 : 2;
 }
is not really much of a speedup, if any, in fact it reorganized the
instructions for me to make it less efficient if c>=0 and <= 255 when
using clang (you would need to add some LIKELY to obvious locations).
YMMV when the code is included in the location it's called and other
compilers are used. Which is sort of the point.
Also, it is generally a good idea to add what is unsigned, we do so
every where else. And, pike 32bit chars are actually signed.
Another random thing:  _encode should probably not empty the buffer.
It's a fairly unexpected side-effect.
...
Benefits of the new String.Buffer implementation:

Close to drop-in replacement for IOBuffer.

Yes, but slower, and why?
...

Performance hovers around that of IOBuffer's.

Well, not really, but granted, for simple stuff it's comparable, and
unless you modify the buffer when subbuffers are live the ordos should
be the same (but I might modify IOBuffer to be circular in the simple
case, since that is rather a lot faster unless the buffer size is
extended all the time (and it is usually not in a long-lived
connection, most packets/whatever tend to have similar sizes) This is
surprisingly trivial since most of the things use the io_* abstraction
layer)
...

Support all native Pike character widths.

That is not a feature as far as IOBuffers are concerned.
In fact it's rather the reverse.
...

Backward compatible with the old String.Buffer.

I really hope so, since it is String.Buffer...
I never said IOBuffer should replace String.Buffer.
...

Creating a String.Buffer object from a shared string does *not* incur

a single copy (and is thus very lightweight).
Of course the same is true for IOBuffer.
...

A String.Buffer initialised from a shared string allows one to

access the shared string in parts without copying the data at all.
Of course the same is true for IOBuffer.
IOBuffer does not even do any copying when you read all data and then
later on add  a new string. Which String.Buffer does. :)
...

No need to account for locks on subbuffers; String.Buffer resolves

those automatically and on demand.

Explicit copying of the buffer data is not supported anymore, the

implicit copy-on-demand mechanism is more efficient.
Well, except that the explicit copy is mainly there to make it obvious
when you accidentally copy. It was intentional, so this is a drawback.
Also, am I right in the fact that the whole buffer has to be copied if a
subbuffer is live while the main buffer is modified?
That seems to be set up for rather severe ordo-accidents that will
not be detected. Common case:
you have a packet based protocol. You extract a packet as a buffer.
You then do a call_out or similar to handle the packet (thread, whatever).
Then new data arrives, and is added to the buffer.
In the String.Buffer case this might copy the buffer, which is never
what you want.
Please note that it would be trivially easy to do the same in IOBuffer, just
modify the code in io_ensure_unlocked to copy and then unlock,
I /intentionally/ did not do so, even when it was suggested here, since it
is, in my opinion, much better if you are required to explicitly uncouple the
subbuffer, which adds a simple call in one place, than to accidentally copy the
whole buffer, which will add extreme overhead for no good reason that will be
very hard to find.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: String.Buffer NG, compare IOBuffer