I have now run a somewhat more extensive test (run the benchmark for 20 hours, alternating with and without the substring type).
Even with this long a run of the testsuite there seems to be a few %-agepoints of variance, which is impressive. The code is identical except for the change to making the string type a bitfield, I would have expected most tests to be 100.0% after 20 hours of sampling.
Anyway, these are the results:
There are three tests with a large difference, and one with a should-be-relevant difference that I can not see how it occurs, however.
1: Append array, operating at 96%. That is a fairly noticeable slowdown compared to most tests. But I can not for the life of me see why it occurs. Bad luck with cache-line alignment of the code? Who knows.
2: Array & string juggling. 36x faster. It would seem that this test does while (sscanf(s,"%d %s",x,s)==2) sum+=x,ia+=({x});
so, this is indeed using the substring code.
3: array_sscanf tag removal, operating at 86% speed. This is almost certainly due to the change in sscanf (use string_slice to make strings when possible. As it turns out it is not possible in this case, but I had assumed gcc would see that (it is statically true in the code that the string argument is null).
Well.
4: Tag removal using sscanf, operating at 442% (closing in on Parser.HTML speed). This is due to the substring optimization, since it removes one tag at a time using sscanf.
-------------------------------------------------------- Test Relative speed -------------------------------------------------------- Ackermann : 106% Adding element to array (global) : 103% Adding element to array (local) : 100% Adding element to array (private global) : 102% 1 Append array : 96% Append mapping (+) : 100% Append mapping (|) : 100% Append multiset : 101% 2 Array & String Juggling : 3669% Array Copy : 102% Array Zero : 101% Binary Trees : 100% Clone null-object : 102% Clone object : 100% Compile : 100% Compile & Exec : 101% Foreach (arr,global) : 101% Foreach (arr,local) : 99% Foreach (arr;local;global) : 100% Foreach (arr;local;local) : 97% GC : 104% Huffman : 100% Huffman (binary) : 101% Insert in array : 100% Insert in mapping : 102% Insert in multiset : 102% Loops Nested (global) : 104% Loops Nested (local) : 100% Loops Nested (local,var) : 100% Loops Recursed : 100% Matrix multiplication (100x100) : 100% Read binary INT128 : 98% Read binary INT16 : 104% Read binary INT32 : 102% Replace (parallel) : 99% Replace (serial) : 103% Simple arithmentics (globals) : 102% Simple arithmentics (private global) : 100% Simple arithmetics (locals) : 100% Sort equal integers : 101% Sort ordered integers : 100% Sort unordered integers : 101% Sort unordered objects : 103% String Creation : 99% String Creation (existing) : 98% String Creation (wide) : 100% Tag removal u. Parser.HTML : 99% Tag removal u. Regexp.PCRE : 98% 3 Tag removal u. array_sscanf : 86% Tag removal u. division : 101% Tag removal u. search : 100% Tag removal using a loop : 98% 4 Tag removal using sscanf : 442% Upper/lower case shift 0 : 98% Upper/lower case shift 1 : 101% call_out handling : 99% call_out handling (with id) : 98%
So... I guess the question now is: Should I merge this as well? I will look into the array_sscanf code, but sscanf is so macrified it is sort of hard to solve it easily, I think. I guess I could pass the code to create a substring as a macro.. ;)