Arne was in Linköping yesterday, so we had an impromptu development discussion about whatever was on peoples minds. Here are my notes for further discussion:
* discussion ** remove constant evaluation timelimit for recursive compilation ** timezone compilation is slow ** Calendars documentation is incomprehensable ** Release 8.0 *** Not stable right now *** Does Bill want to be the release master? *** Backend fails on MacOS. Misses events. ** Don't support HEARTBEAT in SSL on the server ** remove class equals (too many equals) ** lvalues are the bane of performance ** Testcases and a benchmark for SSL are needed ** No backups ** Move _stuff to Debug module? ** Disable portable bytecode storage per default It takes up a lot of the allocated storage at start. ** Use Unicode versions of file operations on Windows ** Writing to the registry on Windows would be nice ** Figure out what todo about char normalization on Unix *** UTF-8 be default everywhere ** remove hardcoded char-encoiding in ASN-1 ** Fix sprintf table modes to... add fewer spaces? ** Prevent gcc unrolling of syscall checks ** Make pike_interpreter thread local ** Pike tracing support for line coverage ** move frame allocation into interpreter ** Put random debugging stuff into a common cmod ** Reduce the number of efuns ** Also more efuns! Things that are called from c code and are fast. ** A way to get a human readable function name everywhere would be nice But perhaps not possible ** Remove old conpat stuff. Begin with stuff that is more than 10 years old. ** Fix computed goto? *shake* Things points to maybe on ARM ** Is arrays incorrectly addressed by pointer into moved array ** extend string buffer to be able to read from it ** Kill ADT.struct? ** callback id is 0, could it be something better? ** Crashes with AIDO and encode_value on large data sets with several threads ** Add new git committer
I’m happy to continue doing releases, assuming my performance has been acceptable and no one else is dying to take over.
I’ve noticed a (the?) problem with backend not working properly on MacOS; it seems to be a behavior of the latest version of Darwin and I think may have to do with the kernel coalescing events. Similar behavior has been noted in other languages (I think I saw something related to python). The tests that fail out of the box do pass if the test timeout is extended from 20 seconds.
Bill
On Aug 9, 2014, at 5:54 AM, Peter Bortas bortas@gmail.com wrote:
Arne was in Linköping yesterday, so we had an impromptu development discussion about whatever was on peoples minds. Here are my notes for further discussion: … -- Peter Bortas
It would also be nice if the exception handling in Pike could be more like Java with try-catch-finally.
On Sun, Aug 10, 2014 at 8:39 AM, H. William Welliver III bill@welliver.org wrote:
I’m happy to continue doing releases, assuming my performance has been acceptable and no one else is dying to take over.
I’ve noticed a (the?) problem with backend not working properly on MacOS; it seems to be a behavior of the latest version of Darwin and I think may have to do with the kernel coalescing events. Similar behavior has been noted in other languages (I think I saw something related to python). The tests that fail out of the box do pass if the test timeout is extended from 20 seconds.
Bill
On Aug 9, 2014, at 5:54 AM, Peter Bortas bortas@gmail.com wrote:
Arne was in Linköping yesterday, so we had an impromptu development discussion about whatever was on peoples minds. Here are my notes for further discussion: … -- Peter Bortas
On Sun, 10 Aug 2014, H. William Welliver III wrote:
I�m happy to continue doing releases, assuming my performance has been acceptable and no one else is dying to take over.
Just to clarify this. Noone said anything about your performance, we just were unsure if you also feel like doing releases for pike 8. If you are happy to do it, I guess everyone else is, too.
arne
Hi Arne,
I didn’t actually mean to suggest I felt there was a problem; I just wanted to give everyone the opportunity to pipe up. I guess no one took the bait :)
Bill
On Aug 12, 2014, at 6:31 AM, Arne Goedeke el@laramies.com wrote:
On Sun, 10 Aug 2014, H. William Welliver III wrote:
I’m happy to continue doing releases, assuming my performance has been acceptable and no one else is dying to take over.
Just to clarify this. Noone said anything about your performance, we just were unsure if you also feel like doing releases for pike 8. If you are happy to do it, I guess everyone else is, too.
arne
Support for Pike 0.5 and 0.6 compat removed. It should now be possible to remove the string* syntax and clean up language.yacc.
Partially done now.
Still left to fix are the very complex rules (define variable,global,function, local function), the issue is that a lot of things are renumbered when the optional_stars token is removed, and since optional_stars is now so optional that it is never ever set I have not bothered fixing them yet.
I have cleaned up the type rules now, however.
** remove class equals (too many equals)
This somewhat obscure point is probably the existence of a whole bunch of x [op]= op-codes (+=, -=, &=, %= etc)
If so I have now removed them.
This turned out to be sort of interesting, actually, since +=, -= and friends did not previously type-check the assignment, only the call to the actual operator.
That is, a += 10 would check that (a+10) did not generate an error, but not that it could be assigned to a. Especially with strict types enabled this is a rather large change.
Take, as an example, gmp.mpz objects (and all other objects overloading operators) since (object(*) + integer) type-checks to 'mixed'. The same is true for the other operators.
I "solved" this for now by adding automatic softcasts to the type to be assigned, which means that it again only checks that the operation can be done validly, not that the assignment will always work.
So, given "Gmp.mpz a": | a += 10;
is now equivalent to | a = [object(Gmp.mpz)](a+10);
On a somewhat related note X[*] = x[*] op <something> was sort of broken as well, it only worked for the short form assignments (x[*] += 10 etc).
I have partially fixed this now.
Previously it always converted the code to
| x = x[*] op <something>
This works, in so far as x gets the new value. But it gives incorrect (semantically, at least) results for this case:
| array x = ..; | array y = x; | x[*] += 10; | /* now x would not be equal to y */
I have fixed this, but auto-map as lvalue (on the left side of assignments) still has at least two issues:
| x[*] = RHS (not an automap)
This will currently just break run time with 'Source is not an array'. (Unless RHS happens to be an array with the same size as the left array, then you will get unexpected results). It should simply assign all elements in the array to RHS.
The other broken thing is multi-level automap assignments.
As an example:
| a[*][*] += 10;
This will currently work, but the expression is basically converted to
| a[*] = (a[*][*]+10)[*];
Which might be closer to what you wanted, but not quite perfect. :)
It looks like the module dumping broke. make sure complains a lot. Either this change or grubbas new layer of indirection for Crypto.Sign.
There was another topic that was mentioned briefly. It was a proposal by Per from one year earlier about string allocation.
Currently, strings (headers + content) are allocated in one chunk. For short strings (up to length of the header) they are allocated by the block allocator, otherwise using malloc. This has performance advantages when handling short strings. On the other hand, keeping the header (which is modified frequently) close to the content (which is constant) can have serious disadvantages for performance.
The proposal was to split up header and string content into separate allocations. I had a branch lying around which does this split. I took the time to rebase it onto current 8.0. You can find it under arne/string_alloc.
The current version allocates short strings using the same block allocator used for the headers which usually results in the string content being located directly after the headers. All other string have their content allocated using malloc.
Benchmarks comparing this new branch to 8.0 have mixed results, but that is to be expected. See below for the results, positive changes correspond to the new branch being faster.
On top of this branch it would be possible to add more flags than STRING_IS_SHORT to allow handling more types of allocation, like mmap and constant strings. Think Stdio.read_file without memcpy.
Feedback welcome.
arne
----------------------------------------------------------------- Test Result Change ----------------------------------------------------------------- Ackermann . . . . . . . . . . . . . . . . . . . . . 33M/s -0.9% Adding element to array (global) . . . . . . . . 5250k/s -0.6% Adding element to array (local) . . . . . . . . . 5741k/s -12.2% Adding element to array (private global) . . . . . 5201k/s -1.3% Append array . . . . . . . . . . . . . . . . . . . 27M/s -0.7% Append mapping (+) . . . . . . . . . . . . . . . . 49k/s -0.1% Append mapping (|) . . . . . . . . . . . . . . . . 53k/s -12.8% Append multiset . . . . . . . . . . . . . . . . . . 114k/s 5.4% Array & String Juggling . . . . . . . . . . . . . . 80k/s 0.6% Array Copy . . . . . . . . . . . . . . . . . . . . 41M/s 11.0% Array Zero . . . . . . . . . . . . . . . . . . . . 258k/s 0.8% Binary Trees . . . . . . . . . . . . . . . . . . . 933k/s -1.1% Clone null-object . . . . . . . . . . . . . . . . . 10M/s -1.5% Clone object . . . . . . . . . . . . . . . . . . 5900k/s -0.5% Compile . . . . . . . . . . . . . . . . . . . 97k lines/s 3.7% Compile & Exec . . . . . . . . . . . . . . . 95k lines/s 1.5% Foreach (arr,global) . . . . . . . . . . . . . . . 66M/s -4.1% Foreach (arr,local) . . . . . . . . . . . . . . . . 178M/s -0.9% Foreach (arr;local;global) . . . . . . . . . . . . 40M/s -2.0% Foreach (arr;local;local) . . . . . . . . . . . . . 59M/s -0.5% GC . . . . . . . . . . . . . . . . . . . . . . . . 1496/s 1.8% Insert in array . . . . . . . . . . . . . . . . . . 51M/s 1.2% Insert in mapping . . . . . . . . . . . . . . . . 8913k/s 0.7% Insert in multiset . . . . . . . . . . . . . . . 3461k/s 4.0% Loops Nested (global) . . . . . . . . . . . . . . . 32M/s 0.3% Loops Nested (local) . . . . . . . . . . . . . . . 37M/s 0.6% Loops Nested (local,var) . . . . . . . . . . . . . 37M/s -0.0% Loops Recursed . . . . . . . . . . . . . . . . . . 17M/s 1.4% Matrix multiplication (100x100) . . . . . . . . 2.19 GF/s 0.6% Read binary INT128 . . . . . . . . . . . . . . . . 172k/s 25.7% Read binary INT16 . . . . . . . . . . . . . . . . . 15M/s 17.1% Read binary INT32 . . . . . . . . . . . . . . . . . 11M/s -1.6% Replace (parallel) . . . . . . . . . . . . . . . . 10k/s 0.2% Replace (serial) . . . . . . . . . . . . . . . . . 16k/s -0.1% Simple arithmentics (globals) . . . . . . . . . . . 94M/s 1.3% Simple arithmentics (private global) . . . . . . . 117M/s 1.2% Simple arithmetics (locals) . . . . . . . . . . . . 147M/s 1.0% Sort equal integers . . . . . . . . . . . . . . . . 71M/s 0.2% Sort ordered integers . . . . . . . . . . . . . . . 87M/s 0.9% Sort unordered integers . . . . . . . . . . . . . . 14M/s 1.7% Sort unordered objects . . . . . . . . . . . . . . 565k/s 8.7% String Creation . . . . . . . . . . . . . . . . . 2585k/s 1.6% String Creation (existing) . . . . . . . . . . . 6327k/s 2.2% String Creation (wide) . . . . . . . . . . . . . . 583k/s -2.9% Tag removal u. Parser.HTML . . . . . . . . . . . 4232k/s -10.0% Tag removal u. Regexp.PCRE . . . . . . . . . . . . 442k/s 1.7% Tag removal u. array_sscanf . . . . . . . . . . . 6110k/s 4.3% Tag removal u. division . . . . . . . . . . . . . . 817k/s -2.4% Tag removal u. search . . . . . . . . . . . . . . . 985k/s 4.0% Tag removal using a loop . . . . . . . . . . . . . 181k/s 2.4% Tag removal using sscanf . . . . . . . . . . . . . 431k/s -1.2% Upper/lower case shift 0 . . . . . . . . . . . . . 123M/s 0.9% Upper/lower case shift 1 . . . . . . . . . . . . . 60M/s 0.7% call_out handling . . . . . . . . . . . . . . . . . 185k/s 1.2% call_out handling (with id) . . . . . . . . . . . 3324k/s -6.7% ----------------------------------------------------------------- 0.8% -----------------------------------------------------------------
It might be that he is using a laptop. Or just has power savings enabled. I have noticed differences on the order of 70% on my macbook with _no_ code changes.
The adaptive frequencies is generating issues, even with the power supply connected. In general it is depressingly hard to benchmark things these days, any difference that is less than 2x faster/slower disappears in the noise.
I think the reason is that append array appends random_string(10) repeatedly, so in some way its also measuring short string handling for a very specific length. It might be that the extra pointer deref is significant there. I ran these benchmarks several times and at least the results above 10% were stable. I generally at least disable the gc when benchmarking, otherwise there will be random spikes, which can be quite confusing at times.
I had power saving disabled but it might still be the case that automatic throttling kicks in due to thermal events. If anyone feels like running a benchmark on their machines, that would be great ,)
arne
On Mon, 18 Aug 2014, Per Hedbor () @ Pike (-) developers forum wrote:
It might be that he is using a laptop. Or just has power savings enabled. I have noticed differences on the order of 70% on my macbook with _no_ code changes.
The adaptive frequencies is generating issues, even with the power supply connected. In general it is depressingly hard to benchmark things these days, any difference that is less than 2x faster/slower disappears in the noise.
I updated the string_alloc branch with support for static strings (the string data itself is inside the data section). In that branch its used for program constants, functions, efun names, etc. It saves quite a bit of heap memory. The statistics look like this:
array_bytes 64 bytes 20.1 kB free_block_bytes 16.9 kB 800.1 kB malloc_block_bytes -52 kB 2.5 MB malloc_bytes -68.8 kB 1.2 MB marker_bytes -30.0 kB 216 bytes num_arrays 1 254 num_malloc -119 116 num_short_pike_strings -815 1347 num_static_pike_strings 1067 1067 num_strings 3 3137 short_pike_string_bytes -240.3 kB 0 bytes string_bytes 6.2 kB 522.3 kB
This table displays the difference in output of Debug.memory_usage, the thirds column is the output for the string_alloc branch, the second the difference between pike 8.0 and string_alloc. The difference should be even bigger with more modules loaded, this is the output of
pike -e 'write("%O\n", Debug.memory_usage());'
Not sure where to go with this. The benchmark results are still pretty mixed, so this is mainly saving memory right now.
arne
On Mon, 18 Aug 2014, Arne Goedeke wrote:
There was another topic that was mentioned briefly. It was a proposal by Per from one year earlier about string allocation.
Currently, strings (headers + content) are allocated in one chunk. For short strings (up to length of the header) they are allocated by the block allocator, otherwise using malloc. This has performance advantages when handling short strings. On the other hand, keeping the header (which is modified frequently) close to the content (which is constant) can have serious disadvantages for performance.
The proposal was to split up header and string content into separate allocations. I had a branch lying around which does this split. I took the time to rebase it onto current 8.0. You can find it under arne/string_alloc.
The current version allocates short strings using the same block allocator used for the headers which usually results in the string content being located directly after the headers. All other string have their content allocated using malloc.
Benchmarks comparing this new branch to 8.0 have mixed results, but that is to be expected. See below for the results, positive changes correspond to the new branch being faster.
On top of this branch it would be possible to add more flags than STRING_IS_SHORT to allow handling more types of allocation, like mmap and constant strings. Think Stdio.read_file without memcpy.
Feedback welcome.
arne
Test Result Change
Ackermann . . . . . . . . . . . . . . . . . . . . . 33M/s -0.9% Adding element to array (global) . . . . . . . . 5250k/s -0.6% Adding element to array (local) . . . . . . . . . 5741k/s -12.2% Adding element to array (private global) . . . . . 5201k/s -1.3% Append array . . . . . . . . . . . . . . . . . . . 27M/s -0.7% Append mapping (+) . . . . . . . . . . . . . . . . 49k/s -0.1% Append mapping (|) . . . . . . . . . . . . . . . . 53k/s -12.8% Append multiset . . . . . . . . . . . . . . . . . . 114k/s 5.4% Array & String Juggling . . . . . . . . . . . . . . 80k/s 0.6% Array Copy . . . . . . . . . . . . . . . . . . . . 41M/s 11.0% Array Zero . . . . . . . . . . . . . . . . . . . . 258k/s 0.8% Binary Trees . . . . . . . . . . . . . . . . . . . 933k/s -1.1% Clone null-object . . . . . . . . . . . . . . . . . 10M/s -1.5% Clone object . . . . . . . . . . . . . . . . . . 5900k/s -0.5% Compile . . . . . . . . . . . . . . . . . . . 97k lines/s 3.7% Compile & Exec . . . . . . . . . . . . . . . 95k lines/s 1.5% Foreach (arr,global) . . . . . . . . . . . . . . . 66M/s -4.1% Foreach (arr,local) . . . . . . . . . . . . . . . . 178M/s -0.9% Foreach (arr;local;global) . . . . . . . . . . . . 40M/s -2.0% Foreach (arr;local;local) . . . . . . . . . . . . . 59M/s -0.5% GC . . . . . . . . . . . . . . . . . . . . . . . . 1496/s 1.8% Insert in array . . . . . . . . . . . . . . . . . . 51M/s 1.2% Insert in mapping . . . . . . . . . . . . . . . . 8913k/s 0.7% Insert in multiset . . . . . . . . . . . . . . . 3461k/s 4.0% Loops Nested (global) . . . . . . . . . . . . . . . 32M/s 0.3% Loops Nested (local) . . . . . . . . . . . . . . . 37M/s 0.6% Loops Nested (local,var) . . . . . . . . . . . . . 37M/s -0.0% Loops Recursed . . . . . . . . . . . . . . . . . . 17M/s 1.4% Matrix multiplication (100x100) . . . . . . . . 2.19 GF/s 0.6% Read binary INT128 . . . . . . . . . . . . . . . . 172k/s 25.7% Read binary INT16 . . . . . . . . . . . . . . . . . 15M/s 17.1% Read binary INT32 . . . . . . . . . . . . . . . . . 11M/s -1.6% Replace (parallel) . . . . . . . . . . . . . . . . 10k/s 0.2% Replace (serial) . . . . . . . . . . . . . . . . . 16k/s -0.1% Simple arithmentics (globals) . . . . . . . . . . . 94M/s 1.3% Simple arithmentics (private global) . . . . . . . 117M/s 1.2% Simple arithmetics (locals) . . . . . . . . . . . . 147M/s 1.0% Sort equal integers . . . . . . . . . . . . . . . . 71M/s 0.2% Sort ordered integers . . . . . . . . . . . . . . . 87M/s 0.9% Sort unordered integers . . . . . . . . . . . . . . 14M/s 1.7% Sort unordered objects . . . . . . . . . . . . . . 565k/s 8.7% String Creation . . . . . . . . . . . . . . . . . 2585k/s 1.6% String Creation (existing) . . . . . . . . . . . 6327k/s 2.2% String Creation (wide) . . . . . . . . . . . . . . 583k/s -2.9% Tag removal u. Parser.HTML . . . . . . . . . . . 4232k/s -10.0% Tag removal u. Regexp.PCRE . . . . . . . . . . . . 442k/s 1.7% Tag removal u. array_sscanf . . . . . . . . . . . 6110k/s 4.3% Tag removal u. division . . . . . . . . . . . . . . 817k/s -2.4% Tag removal u. search . . . . . . . . . . . . . . . 985k/s 4.0% Tag removal using a loop . . . . . . . . . . . . . 181k/s 2.4% Tag removal using sscanf . . . . . . . . . . . . . 431k/s -1.2% Upper/lower case shift 0 . . . . . . . . . . . . . 123M/s 0.9% Upper/lower case shift 1 . . . . . . . . . . . . . 60M/s 0.7% call_out handling . . . . . . . . . . . . . . . . . 185k/s 1.2% call_out handling (with id) . . . . . . . . . . . 3324k/s -6.7%
0.8%
pike-devel@lists.lysator.liu.se