On request since there was some regression since the last release.
Pike 8.0.388 beta/release candidate:
https://pike.lysator.liu.se/pub/pike/beta/8.0.388/Pike-v8.0.388.tar.gz
Other builds:
https://pike.lysator.liu.se/pub/pike/beta/8.0.388/Pike-v8.0.388-Darwin-15.4.... https://pike.lysator.liu.se/pub/pike/beta/8.0.388/Pike-v8.0.388-win32-oldlib...
Documented changes since Christmas beta 1: -------------------------------------------------------------------
New features ------------
o Thread
_sprintf() improvements: Thread.Mutex now prints the ID of the thread holding the lock, and thread IDs are shown as hexadecimal numbers.
I would like to merge the branch arne/new_buffer. It implements and uses the new dynamic buffer implementation I have talked about on the conference. The main idea behind this implementation is to help the compiler generate better code in spite of the aliasing rules of C. In particular it tries to allow compilers to coalesce individual writes to the buffer into larger MOVs. This works reasonably well in gcc and clang.
The branch replaces all uses of dynamic_buffer by the new byte_buffer API. In my opinion, the dynamic_buffer API had slightly confusing function names, which I attempted to improve. The old API from dynamic_buffer still exists as wrappers around the new code, which means that external modules using dynamic_buffer should continue to work. One feature of the dynamic_buffer API was to use a global buffer object. I have removed all in-tree uses of that global buffer, which means that e.g. describe_svalue is now in theory reentrant. There are probably some parts of describe_svalue which could be further simplified due to that.
The immediate benefits are that encode_value and describe_svalue got some significant speedup. I added some benchmarks to the pike-benchmark repository. On my machine the results look like this:
buffer/describe.pike#encode_array(int) | 83.7 M 2.8 % | 104.3 M 2.0 % | buffer/describe.pike#encode_string(16bit) | 56.7 M 1.7 % | 65.3 M 0.1 % | buffer/describe.pike#encode_string(32bit) | 86.5 M 1.6 % | 98.1 M 0.2 % | buffer/describe.pike#encode_string(8bit) | 47.4 M 1.6 % | 51.7 M 0.2 % | buffer/encode.pike#decode_array(float) | 136.2 M 15.1 % | 146.9 M 0.5 % | buffer/encode.pike#decode_array(int) | 73.0 M 1.7 % | 71.1 M 0.8 % | buffer/encode.pike#decode_string(16bit) | 2.4 G 3.2 % | 2.5 G 1.8 % | buffer/encode.pike#decode_string(32bit) | 4.1 G 2.5 % | 4.1 G 2.0 % | buffer/encode.pike#decode_string(8bit) | 10.0 G 2.1 % | 9.9 G 3.9 % | buffer/encode.pike#encode_array(float) | 4.1 M 1.2 % | 4.1 M 0.2 % | buffer/encode.pike#encode_array(int) | 54.2 M 2.1 % | 58.8 M 0.6 % | buffer/encode.pike#encode_string(16bit) | 288.4 M 9.7 % | 1.8 G 1.9 % | buffer/encode.pike#encode_string(32bit) | 581.6 M 1.6 % | 3.4 G 1.3 % | buffer/encode.pike#encode_string(8bit) | 6.2 G 4.2 % | 8.9 G 4.1 % |
The first column are the results for current 8.1, the second the results for the new_buffer branch. Feel free to run those benchmarks on your hardware.
One change which I am unsure about is the simplification of do_read() in Stdio.Fd. The previous version tried to optimize read buffer sizes for small reads on sockets. My gut feeling is that we should not try to decide what number of bytes to read, but instead have the caller decide. This would allow doing efficient reads from files (with large buffers) and have the callback code read smaller chunks.
Once this branch has been merged I would like to change the machine code generators to use it. However, that will require some API changes to everything in code/*. This was in fact the initial reason why I started looking into this.
Comments welcome.
Arne
Hi Arne.
I would like to merge the branch arne/new_buffer. It implements and uses the new dynamic buffer implementation I have talked about on the conference. The main idea behind this implementation is to help the compiler generate better code in spite of the aliasing rules of C. In particular it tries to allow compilers to coalesce individual writes to the buffer into larger MOVs. This works reasonably well in gcc and clang.
I've only had a cursory look at the new code, but this sounds good to me.
The immediate benefits are that encode_value and describe_svalue got some significant speedup. I added some benchmarks to the pike-benchmark repository. On my machine the results look like this:
buffer/encode.pike#decode_array(int) | 73.0 M 1.7 % | 71.1 M 0.8 % | buffer/encode.pike#decode_string(8bit) | 10.0 G 2.1 % | 9.9 G 3.9 % |
Looks great except for the above two, which I suspect fall within the measurement margin of error.
Once this branch has been merged I would like to change the machine code generators to use it. However, that will require some API changes to everything in code/*. This was in fact the initial reason why I started looking into this.
Sounds great.
/grubba
On Fri, 30 Dec 2016, Henrik Grubbstr�m (Lysator) @ Pike (-) developers forum wrote:
The immediate benefits are that encode_value and describe_svalue got some significant speedup. I added some benchmarks to the pike-benchmark repository. On my machine the results look like this:
buffer/encode.pike#decode_array(int) | 73.0 M 1.7 % | 71.1 M 0.8 % | buffer/encode.pike#decode_string(8bit) | 10.0 G 2.1 % | 9.9 G 3.9 % |
Looks great except for the above two, which I suspect fall within the measurement margin of error.
Yes, I think those are noise. Only those tests with encode_* use the new buffers, the decoding steps do not use it and should not be different. I will run the benchmarks on a machine with more stable performance (non-mobile) and post some new results.
Arne
pike-devel@lists.lysator.liu.se