Re: split-string branch

List overview All Threads
Download

newer

older

intrinsics

Chris Angelico

19 Aug 2015 19 Aug '15

12:39 a.m.

On Wed, Aug 19, 2015 at 6:13 AM, Per Hedbor () @ Pike (-) developers forum 10353@lyskom.lysator.liu.se wrote:

...

Another advantage with this method is that it is reasonably trivial to add new string types. As an example that I am seriously considering: A string that is a tail port of another string.

The tail-string would be used for this somewhat unfortunate but rather common coding pattern:

sscanf( str, "%2c%s", part, str );

and, in a similar manner:

len = file->write(str); str = str[written..]

These are both O(n^2), but if tail-of string are added they could both be about O(n), only reallocate the data if more than 50% is wasted.

This specific improvement would be useful to this code, although I'm noticing the slowdown only when I do stupid stuff for benchmarking.

https://github.com/Rosuav/Hogan/blob/master/hogan.pike#L25

Figuring out a good chunk size in application code is hard. If I could simply code idiomatically without chunking, and let the language worry about efficiency, it'd be awesome.

ChrisA

Show replies by date

Per Hedbor

19 Aug 19 Aug

3:45 a.m.

New subject: split-string branch

In this specific case you could use Stdio.Buffer instead.

Change writeme to be a buffer, then add to it using ->add(string)

In the socket_write function, you only need this:

if( sizeof( con->_writeme ) ) con->_writeme->output_to(con->_socket);

This should be O(1) regardless of the buffer size, and in general slightly faster than strings even for short:ish buffers.

On Wed, Aug 19, 2015 at 2:39 AM, Chris Angelico rosuav@gmail.com wrote:

...

On Wed, Aug 19, 2015 at 6:13 AM, Per Hedbor () @ Pike (-) developers forum 10353@lyskom.lysator.liu.se wrote:

...
Another advantage with this method is that it is reasonably trivial to add new string types. As an example that I am seriously considering: A string that is a tail port of another string.

The tail-string would be used for this somewhat unfortunate but rather common coding pattern:

sscanf( str, "%2c%s", part, str );

and, in a similar manner:

len = file->write(str); str = str[written..]

These are both O(n^2), but if tail-of string are added they could both be about O(n), only reallocate the data if more than 50% is wasted.

This specific improvement would be useful to this code, although I'm noticing the slowdown only when I do stupid stuff for benchmarking.

https://github.com/Rosuav/Hogan/blob/master/hogan.pike#L25

Figuring out a good chunk size in application code is hard. If I could simply code idiomatically without chunking, and let the language worry about efficiency, it'd be awesome.

ChrisA

Chris Angelico

4:05 a.m.

New subject: split-string branch

On Wed, Aug 19, 2015 at 1:45 PM, Per Hedbor per@hedbor.org wrote:

...

In this specific case you could use Stdio.Buffer instead.

Change writeme to be a buffer, then add to it using ->add(string)

In the socket_write function, you only need this:

if( sizeof( con->_writeme ) ) con->_writeme->output_to(con->_socket);

This should be O(1) regardless of the buffer size, and in general slightly faster than strings even for short:ish buffers.

Presumably with the same check-and-trim behaviour, so it'd look like this:

conn->_writeme->consume(conn->_writeme->output_to(conn->_sock));

But the main problem is that, as far as I know, Stdio.Buffer is available only in the newer Pikes. I guess it'll have to be guarded with a #if constant(Stdio.Buffer), but that means maintaining another code branch. Maybe I'll drop chunked mode in favour of automatically using Stdio.Buffer if it's available, and just warn people "On Pike 7.8, avoid outputting large amounts of data as it can cause performance problems".

ChrisA

Per Hedbor

4:09 a.m.

New subject: split-string branch

There is no need to consume or trim the buffer, it is automatically done in output_to. That removes the written data from the start of the buffer. You would need to keep relevant checks, yes, but the current size (except for a check for 0 size) is not one of the needed checks.

On Wed, Aug 19, 2015 at 6:05 AM, Chris Angelico rosuav@gmail.com wrote:

...

On Wed, Aug 19, 2015 at 1:45 PM, Per Hedbor per@hedbor.org wrote:

...
In this specific case you could use Stdio.Buffer instead.

Change writeme to be a buffer, then add to it using ->add(string)

In the socket_write function, you only need this:

if( sizeof( con->_writeme ) ) con->_writeme->output_to(con->_socket);

This should be O(1) regardless of the buffer size, and in general slightly faster than strings even for short:ish buffers.

Presumably with the same check-and-trim behaviour, so it'd look like this:

conn->_writeme->consume(conn->_writeme->output_to(conn->_sock));

But the main problem is that, as far as I know, Stdio.Buffer is available only in the newer Pikes. I guess it'll have to be guarded with a #if constant(Stdio.Buffer), but that means maintaining another code branch. Maybe I'll drop chunked mode in favour of automatically using Stdio.Buffer if it's available, and just warn people "On Pike 7.8, avoid outputting large amounts of data as it can cause performance problems".

ChrisA

Chris Angelico

4:10 a.m.

New subject: split-string branch

On Wed, Aug 19, 2015 at 2:09 PM, Per Hedbor per@hedbor.org wrote:

...

There is no need to consume or trim the buffer, it is automatically done in output_to. That removes the written data from the start of the buffer. You would need to keep relevant checks, yes, but the current size (except for a check for 0 size) is not one of the needed checks.

Ah cool! That would definitely be the cleanest form of the code, then. I'll add it in and give it a try.

ChrisA

3614

Age (days ago)

3614

Last active (days ago)

pike-devel@lists.lysator.liu.se

4 comments

2 participants

tags (0)

participants (2)

Chris Angelico
Per Hedbor