There was another topic that was mentioned briefly. It was a proposal by Per from one year earlier about string allocation.
Currently, strings (headers + content) are allocated in one chunk. For short strings (up to length of the header) they are allocated by the block allocator, otherwise using malloc. This has performance advantages when handling short strings. On the other hand, keeping the header (which is modified frequently) close to the content (which is constant) can have serious disadvantages for performance.
The proposal was to split up header and string content into separate allocations. I had a branch lying around which does this split. I took the time to rebase it onto current 8.0. You can find it under arne/string_alloc.
The current version allocates short strings using the same block allocator used for the headers which usually results in the string content being located directly after the headers. All other string have their content allocated using malloc.
Benchmarks comparing this new branch to 8.0 have mixed results, but that is to be expected. See below for the results, positive changes correspond to the new branch being faster.
On top of this branch it would be possible to add more flags than STRING_IS_SHORT to allow handling more types of allocation, like mmap and constant strings. Think Stdio.read_file without memcpy.
Feedback welcome.
arne
----------------------------------------------------------------- Test Result Change ----------------------------------------------------------------- Ackermann . . . . . . . . . . . . . . . . . . . . . 33M/s -0.9% Adding element to array (global) . . . . . . . . 5250k/s -0.6% Adding element to array (local) . . . . . . . . . 5741k/s -12.2% Adding element to array (private global) . . . . . 5201k/s -1.3% Append array . . . . . . . . . . . . . . . . . . . 27M/s -0.7% Append mapping (+) . . . . . . . . . . . . . . . . 49k/s -0.1% Append mapping (|) . . . . . . . . . . . . . . . . 53k/s -12.8% Append multiset . . . . . . . . . . . . . . . . . . 114k/s 5.4% Array & String Juggling . . . . . . . . . . . . . . 80k/s 0.6% Array Copy . . . . . . . . . . . . . . . . . . . . 41M/s 11.0% Array Zero . . . . . . . . . . . . . . . . . . . . 258k/s 0.8% Binary Trees . . . . . . . . . . . . . . . . . . . 933k/s -1.1% Clone null-object . . . . . . . . . . . . . . . . . 10M/s -1.5% Clone object . . . . . . . . . . . . . . . . . . 5900k/s -0.5% Compile . . . . . . . . . . . . . . . . . . . 97k lines/s 3.7% Compile & Exec . . . . . . . . . . . . . . . 95k lines/s 1.5% Foreach (arr,global) . . . . . . . . . . . . . . . 66M/s -4.1% Foreach (arr,local) . . . . . . . . . . . . . . . . 178M/s -0.9% Foreach (arr;local;global) . . . . . . . . . . . . 40M/s -2.0% Foreach (arr;local;local) . . . . . . . . . . . . . 59M/s -0.5% GC . . . . . . . . . . . . . . . . . . . . . . . . 1496/s 1.8% Insert in array . . . . . . . . . . . . . . . . . . 51M/s 1.2% Insert in mapping . . . . . . . . . . . . . . . . 8913k/s 0.7% Insert in multiset . . . . . . . . . . . . . . . 3461k/s 4.0% Loops Nested (global) . . . . . . . . . . . . . . . 32M/s 0.3% Loops Nested (local) . . . . . . . . . . . . . . . 37M/s 0.6% Loops Nested (local,var) . . . . . . . . . . . . . 37M/s -0.0% Loops Recursed . . . . . . . . . . . . . . . . . . 17M/s 1.4% Matrix multiplication (100x100) . . . . . . . . 2.19 GF/s 0.6% Read binary INT128 . . . . . . . . . . . . . . . . 172k/s 25.7% Read binary INT16 . . . . . . . . . . . . . . . . . 15M/s 17.1% Read binary INT32 . . . . . . . . . . . . . . . . . 11M/s -1.6% Replace (parallel) . . . . . . . . . . . . . . . . 10k/s 0.2% Replace (serial) . . . . . . . . . . . . . . . . . 16k/s -0.1% Simple arithmentics (globals) . . . . . . . . . . . 94M/s 1.3% Simple arithmentics (private global) . . . . . . . 117M/s 1.2% Simple arithmetics (locals) . . . . . . . . . . . . 147M/s 1.0% Sort equal integers . . . . . . . . . . . . . . . . 71M/s 0.2% Sort ordered integers . . . . . . . . . . . . . . . 87M/s 0.9% Sort unordered integers . . . . . . . . . . . . . . 14M/s 1.7% Sort unordered objects . . . . . . . . . . . . . . 565k/s 8.7% String Creation . . . . . . . . . . . . . . . . . 2585k/s 1.6% String Creation (existing) . . . . . . . . . . . 6327k/s 2.2% String Creation (wide) . . . . . . . . . . . . . . 583k/s -2.9% Tag removal u. Parser.HTML . . . . . . . . . . . 4232k/s -10.0% Tag removal u. Regexp.PCRE . . . . . . . . . . . . 442k/s 1.7% Tag removal u. array_sscanf . . . . . . . . . . . 6110k/s 4.3% Tag removal u. division . . . . . . . . . . . . . . 817k/s -2.4% Tag removal u. search . . . . . . . . . . . . . . . 985k/s 4.0% Tag removal using a loop . . . . . . . . . . . . . 181k/s 2.4% Tag removal using sscanf . . . . . . . . . . . . . 431k/s -1.2% Upper/lower case shift 0 . . . . . . . . . . . . . 123M/s 0.9% Upper/lower case shift 1 . . . . . . . . . . . . . 60M/s 0.7% call_out handling . . . . . . . . . . . . . . . . . 185k/s 1.2% call_out handling (with id) . . . . . . . . . . . 3324k/s -6.7% ----------------------------------------------------------------- 0.8% -----------------------------------------------------------------