I would like to merge the branch arne/new_buffer. It implements and uses the new dynamic buffer implementation I have talked about on the conference. The main idea behind this implementation is to help the compiler generate better code in spite of the aliasing rules of C. In particular it tries to allow compilers to coalesce individual writes to the buffer into larger MOVs. This works reasonably well in gcc and clang.
The branch replaces all uses of dynamic_buffer by the new byte_buffer API. In my opinion, the dynamic_buffer API had slightly confusing function names, which I attempted to improve. The old API from dynamic_buffer still exists as wrappers around the new code, which means that external modules using dynamic_buffer should continue to work. One feature of the dynamic_buffer API was to use a global buffer object. I have removed all in-tree uses of that global buffer, which means that e.g. describe_svalue is now in theory reentrant. There are probably some parts of describe_svalue which could be further simplified due to that.
The immediate benefits are that encode_value and describe_svalue got some significant speedup. I added some benchmarks to the pike-benchmark repository. On my machine the results look like this:
buffer/describe.pike#encode_array(int) | 83.7 M 2.8 % | 104.3 M 2.0 % | buffer/describe.pike#encode_string(16bit) | 56.7 M 1.7 % | 65.3 M 0.1 % | buffer/describe.pike#encode_string(32bit) | 86.5 M 1.6 % | 98.1 M 0.2 % | buffer/describe.pike#encode_string(8bit) | 47.4 M 1.6 % | 51.7 M 0.2 % | buffer/encode.pike#decode_array(float) | 136.2 M 15.1 % | 146.9 M 0.5 % | buffer/encode.pike#decode_array(int) | 73.0 M 1.7 % | 71.1 M 0.8 % | buffer/encode.pike#decode_string(16bit) | 2.4 G 3.2 % | 2.5 G 1.8 % | buffer/encode.pike#decode_string(32bit) | 4.1 G 2.5 % | 4.1 G 2.0 % | buffer/encode.pike#decode_string(8bit) | 10.0 G 2.1 % | 9.9 G 3.9 % | buffer/encode.pike#encode_array(float) | 4.1 M 1.2 % | 4.1 M 0.2 % | buffer/encode.pike#encode_array(int) | 54.2 M 2.1 % | 58.8 M 0.6 % | buffer/encode.pike#encode_string(16bit) | 288.4 M 9.7 % | 1.8 G 1.9 % | buffer/encode.pike#encode_string(32bit) | 581.6 M 1.6 % | 3.4 G 1.3 % | buffer/encode.pike#encode_string(8bit) | 6.2 G 4.2 % | 8.9 G 4.1 % |
The first column are the results for current 8.1, the second the results for the new_buffer branch. Feel free to run those benchmarks on your hardware.
One change which I am unsure about is the simplification of do_read() in Stdio.Fd. The previous version tried to optimize read buffer sizes for small reads on sockets. My gut feeling is that we should not try to decide what number of bytes to read, but instead have the caller decide. This would allow doing efficient reads from files (with large buffers) and have the callback code read smaller chunks.
Once this branch has been merged I would like to change the machine code generators to use it. However, that will require some API changes to everything in code/*. This was in fact the initial reason why I started looking into this.
Comments welcome.
Arne