merging arne/new_buffer

30 Dec 2016


      I would like to merge the branch arne/new_buffer. It implements
and uses the new dynamic buffer implementation I have talked about on
the conference. The main idea behind this implementation is to help the
compiler generate better code in spite of the aliasing rules of C. In
particular it tries to allow compilers to coalesce individual writes to
the buffer into larger MOVs. This works reasonably well in gcc and
clang.
The branch replaces all uses of dynamic_buffer by the new
byte_buffer API. In my opinion, the dynamic_buffer API had slightly confusing
function names, which I attempted to improve.
The old API from dynamic_buffer still exists as wrappers around the new
code, which means that external modules using dynamic_buffer should
continue to work. One feature of the dynamic_buffer API was to use a
global buffer object. I have removed all in-tree uses of that global
buffer, which means that e.g. describe_svalue is now in theory
reentrant. There are probably some parts of describe_svalue which could be
further simplified due to that.
The immediate benefits are that encode_value and describe_svalue got
some significant speedup. I added some benchmarks to the pike-benchmark
repository. On my machine the results look like this:
buffer/describe.pike#encode_array(int)          |   83.7  M  2.8 %   |  104.3  M  2.0 %   |
buffer/describe.pike#encode_string(16bit)       |   56.7  M  1.7 %   |   65.3  M  0.1 %   |
buffer/describe.pike#encode_string(32bit)       |   86.5  M  1.6 %   |   98.1  M  0.2 %   |
buffer/describe.pike#encode_string(8bit)        |   47.4  M  1.6 %   |   51.7  M  0.2 %   |
buffer/encode.pike#decode_array(float)          |  136.2  M  15.1 %  |  146.9  M  0.5 %   |
buffer/encode.pike#decode_array(int)            |   73.0  M  1.7 %   |   71.1  M  0.8 %   |
buffer/encode.pike#decode_string(16bit)         |    2.4  G  3.2 %   |    2.5  G  1.8 %   |
buffer/encode.pike#decode_string(32bit)         |    4.1  G  2.5 %   |    4.1  G  2.0 %   |
buffer/encode.pike#decode_string(8bit)          |   10.0  G  2.1 %   |    9.9  G  3.9 %   |
buffer/encode.pike#encode_array(float)          |    4.1  M  1.2 %   |    4.1  M  0.2 %   |
buffer/encode.pike#encode_array(int)            |   54.2  M  2.1 %   |   58.8  M  0.6 %   |
buffer/encode.pike#encode_string(16bit)         |  288.4  M  9.7 %   |    1.8  G  1.9 %   |
buffer/encode.pike#encode_string(32bit)         |  581.6  M  1.6 %   |    3.4  G  1.3 %   |
buffer/encode.pike#encode_string(8bit)          |    6.2  G  4.2 %   |    8.9  G  4.1 %   |
The first column are the results for current 8.1, the second the results
for the new_buffer branch. Feel free to run those benchmarks on your
hardware.
One change which I am unsure about is the simplification of do_read() in
Stdio.Fd.
The previous version tried to optimize read buffer sizes for small reads
on sockets. My gut feeling is that we should not try to decide what
number of bytes to read, but instead have the caller decide. This would
allow doing efficient reads from files (with large buffers) and have the
callback code read smaller chunks.
Once this branch has been merged I would like to change the machine code
generators to use it. However, that will require some API changes to
everything in code/*. This was in fact the initial reason why I started
looking into this.
Comments welcome.
Arne

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

merging arne/new_buffer