Re: Development discussion

5 Sep 2014


      I updated the string_alloc branch with support for static strings 
(the string data itself is inside the data section). In that branch its
used for program constants, functions, efun names, etc. It saves quite a
bit of heap memory. The statistics look like this:
array_bytes    64 bytes      20.1 kB
               free_block_bytes     16.9 kB      800.1 kB
             malloc_block_bytes      -52 kB      2.5 MB
                   malloc_bytes    -68.8 kB      1.2 MB
                   marker_bytes    -30.0 kB      216 bytes
                     num_arrays           1      254
                     num_malloc        -119      116
         num_short_pike_strings        -815      1347
        num_static_pike_strings        1067      1067
                    num_strings           3      3137
        short_pike_string_bytes   -240.3 kB      0 bytes
                   string_bytes      6.2 kB      522.3 kB
This table displays the difference in output of Debug.memory_usage, the
thirds column is the output for the string_alloc branch, the second the
difference between pike 8.0 and string_alloc. The difference should be
even bigger with more modules loaded, this is the output of
pike -e 'write("%O\n", Debug.memory_usage());'
Not sure where to go with this. The benchmark results are still pretty
mixed, so this is mainly saving memory right now.
arne
On Mon, 18 Aug 2014, Arne Goedeke wrote:
...
There was another topic that was mentioned briefly. It was a proposal by
Per from one year earlier about string allocation.
Currently, strings (headers + content) are allocated in one chunk. For short
strings (up to length of the header) they are allocated by the block 
allocator,
otherwise using malloc. This has performance advantages when handling
short strings. On the other hand, keeping the header (which is modified
frequently) close to the content (which is constant) can have serious
disadvantages for performance.
The proposal was to split up header and string content into separate
allocations. I had a branch lying around which does this split.  I took
the time to rebase it onto current 8.0. You can find it under 
arne/string_alloc.
The current version allocates short strings using the same block
allocator used for the headers which usually results in the string
content being located directly after the headers. All other string have
their content allocated using malloc.
Benchmarks comparing this new branch to 8.0 have mixed results, but that
is to be expected. See below for the results, positive changes correspond
to the new branch being faster.
On top of this branch it would be possible to add more flags than
STRING_IS_SHORT to allow handling more types of allocation, like mmap
and constant strings. Think Stdio.read_file without memcpy.
Feedback welcome.
arne

Test                                                Result Change
Ackermann . . . . . . . . . . . . . . . . . . . . .  33M/s  -0.9%
Adding element to array (global)  . . . . . . . .  5250k/s  -0.6%
Adding element to array (local) . . . . . . . . .  5741k/s -12.2%
Adding element to array (private global) . . . . .  5201k/s  -1.3%
Append array  . . . . . . . . . . . . . . . . . . .  27M/s  -0.7%
Append mapping (+)  . . . . . . . . . . . . . . . .  49k/s  -0.1%
Append mapping (|)  . . . . . . . . . . . . . . . .  53k/s -12.8%
Append multiset . . . . . . . . . . . . . . . . . . 114k/s   5.4%
Array & String Juggling . . . . . . . . . . . . . .  80k/s   0.6%
Array Copy  . . . . . . . . . . . . . . . . . . . .  41M/s  11.0%
Array Zero  . . . . . . . . . . . . . . . . . . . . 258k/s   0.8%
Binary Trees  . . . . . . . . . . . . . . . . . . . 933k/s  -1.1%
Clone null-object . . . . . . . . . . . . . . . . .  10M/s  -1.5%
Clone object  . . . . . . . . . . . . . . . . . .  5900k/s  -0.5%
Compile . . . . . . . . . . . . . . . . . . .  97k lines/s   3.7%
Compile & Exec  . . . . . . . . . . . . . . .  95k lines/s   1.5%
Foreach (arr,global)  . . . . . . . . . . . . . . .  66M/s  -4.1%
Foreach (arr,local) . . . . . . . . . . . . . . . . 178M/s  -0.9%
Foreach (arr;local;global)  . . . . . . . . . . . .  40M/s  -2.0%
Foreach (arr;local;local) . . . . . . . . . . . . .  59M/s  -0.5%
GC  . . . . . . . . . . . . . . . . . . . . . . . . 1496/s   1.8%
Insert in array . . . . . . . . . . . . . . . . . .  51M/s   1.2%
Insert in mapping . . . . . . . . . . . . . . . .  8913k/s   0.7%
Insert in multiset  . . . . . . . . . . . . . . .  3461k/s   4.0%
Loops Nested (global) . . . . . . . . . . . . . . .  32M/s   0.3%
Loops Nested (local)  . . . . . . . . . . . . . . .  37M/s   0.6%
Loops Nested (local,var)  . . . . . . . . . . . . .  37M/s  -0.0%
Loops Recursed  . . . . . . . . . . . . . . . . . .  17M/s   1.4%
Matrix multiplication (100x100) . . . . . . . .  2.19 GF/s   0.6%
Read binary INT128  . . . . . . . . . . . . . . . . 172k/s  25.7%
Read binary INT16 . . . . . . . . . . . . . . . . .  15M/s  17.1%
Read binary INT32 . . . . . . . . . . . . . . . . .  11M/s  -1.6%
Replace (parallel)  . . . . . . . . . . . . . . . .  10k/s   0.2%
Replace (serial)  . . . . . . . . . . . . . . . . .  16k/s  -0.1%
Simple arithmentics (globals) . . . . . . . . . . .  94M/s   1.3%
Simple arithmentics (private global)  . . . . . . . 117M/s   1.2%
Simple arithmetics (locals) . . . . . . . . . . . . 147M/s   1.0%
Sort equal integers . . . . . . . . . . . . . . . .  71M/s   0.2%
Sort ordered integers . . . . . . . . . . . . . . .  87M/s   0.9%
Sort unordered integers . . . . . . . . . . . . . .  14M/s   1.7%
Sort unordered objects  . . . . . . . . . . . . . . 565k/s   8.7%
String Creation . . . . . . . . . . . . . . . . .  2585k/s   1.6%
String Creation (existing)  . . . . . . . . . . .  6327k/s   2.2%
String Creation (wide)  . . . . . . . . . . . . . . 583k/s  -2.9%
Tag removal u. Parser.HTML  . . . . . . . . . . .  4232k/s -10.0%
Tag removal u. Regexp.PCRE  . . . . . . . . . . . . 442k/s   1.7%
Tag removal u. array_sscanf . . . . . . . . . . .  6110k/s   4.3%
Tag removal u. division . . . . . . . . . . . . . . 817k/s  -2.4%
Tag removal u. search . . . . . . . . . . . . . . . 985k/s   4.0%
Tag removal using a loop  . . . . . . . . . . . . . 181k/s   2.4%
Tag removal using sscanf  . . . . . . . . . . . . . 431k/s  -1.2%
Upper/lower case shift 0  . . . . . . . . . . . . . 123M/s   0.9%
Upper/lower case shift 1  . . . . . . . . . . . . .  60M/s   0.7%
call_out handling . . . . . . . . . . . . . . . . . 185k/s   1.2%
call_out handling (with id) . . . . . . . . . . .  3324k/s  -6.7%

                                                        0.8%

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: Development discussion

Test Result Change