This Saturday we had a short (6 hours) Pike meeting in Linköping where we discussed what we would like to do with the langauge for the near future. The two topics were basically (besides trading ideas and discussions in general) what we need to do to make the next release in shortest amount of time, and what we would like to do for the next immediate release.
The consensus was to attempt to push for a 8.0 release, given the large amount of changes since 7.0, and to make that release with the current feature set. The task list for that on a high level is
- Fork a 8.0 and next branch (9.0?) soonish (within a week?)
- Fix compilation issues. This basically boils down to a requirement on version 4 of GCC and to remove redundant configure tests that today prevents correct behavior.
- Fix the testsuite. There are not very many issues that needs to be fixed, but they have been around for a while, suggesting that they may be tricky to fix, or at least requires a time investment from someone knowledgable. We have a few type system issues, a scope issue with inherit, a test case that breaks the pike stack and broken dumped file system.
- Valgrind currently takes issue with a few of the tests. Check if these are real and fix them.
- We have a few documentation issues reported by the autodoc extractor. These should be fixed.
- Test that exported source and exported builds compile/installs/work.
- Set up Xenofarm to build Pike 8.
- Check if there are obvious performance regressions from 7.8, and address them if that appears to be easy.
- And everyones favorite; Write a changelog.
For the next version of Pike we want to be more explicit about supported operating systems. Linux, Solaris, MacOS, Windows and, if Marcus supports it, AmigaOS.
- Remove configure and transition to a profile based system where a pike script generates header files.
- Remove the compat system, as it creates a lot of maintenance overhead, ugly code and is of dubious value in practice.
- Consider removing the less used C modules to cut down on dependenices and compilation time. DVB, Mird, spider and PDF were mentioned as examples.
- Remove the Pike security system. This is not maintained and would simplify the code.
- Remove the bundle system. All the bundle dependencies are now available as packages and most are already installed by default.
- Make support for bignums mandatory, to remove all ifdef bignum code.
- Remove support for old systems (e.g. outdated code in port.c)
- Remove short svalue support.
- Remove Fd_ref from Stdio.File, currently implemented under STDIO_DIRECT_FD define.
- Implement typed constants.
- Implement "operator" as an alternative syntax for "`".
- Replace block alloc with a single (and better) implementation.
- Remove local calendar locale configuration support.
- Split mega apply into a pike and a C version to speed up calls between pike functions.
- Remove the mark stack.
- Renumber the types so that PIKE_T_INT becomes 0.
Any item that makes the release of 8.0 easier and faster should be done there rather than later.
Looking back at older notes, we decided in 2007 (Pike conference?) to rename "files" to something else to not have this common variable name as a module name. I would suggest _Stdio as a new name.
Well, we might generate the cmakefiles with a pike script to cut down on the amount of information duplication in git. :)
I pushed the first half of the block_alloc.h replacement. The new block allocator improves the worst case search time when freeing blocks. The old one kept pages in a linked list and had to search linearly to find the right page on free.
Also, unlike the old one it is not using macros to inline the block size, which decreases code size considerably.
The old block allocator behaved very badly, when freeing blocks in a somewhat random fashion. Consider this example:
class Node { object left, right; }
int rec1(object node, int depth) { if (depth < 20) { rec1(node->left = Node(), depth+1); rec1(node->right = Node(), depth+1); } }
int main() { object n = Node(); rec1(n, 0); n = 0; return 0; }
Bad things happen, when freeing the tree. This particular example is about seven times faster using the new allocator.
There is some places where the old one is still in use. Furthermore, since it clashes with cc0851de980d46e85e514657d8d9a484fe8be2eb, its in a seperate branch right now (arne/new_block_alloc). Using malloc/free as in cc0851de980d46e85e514657d8d9a484fe8be2eb is certainly much faster than the old worst case behavior. I would guess the new block allocator is faster than malloc/free, so I propose to overwrite cc0851de980d46e85e514657d8d9a484fe8be2eb by this. Any objections?
arne
Using malloc/free as in cc0851de980d46e85e514657d8d9a484fe8be2eb is certainly much faster than the old worst case behavior. I would guess the new block allocator is faster than malloc/free, so I propose to overwrite cc0851de980d46e85e514657d8d9a484fe8be2eb by this. Any objections?
I am not entirely certain it is actually faster than the new pike_frame code.
That one actually never calls free, and only calls malloc every 8Kb of pikeframes. It is almost impossible to write something that is faster, the only way is to remove the accounting (num_*++ in the allocator and the deallocator, which about doubles the speed)
There is of course tradeoffs. It, as mentioned, never returns memory to the system.
A better solution for the future would probably be to keep the frames on the stack, and then duplicate them when needed for the scope code, (and by necessity change the calling convention for pike functions from addr=mega_apply(...);*addr() to mega_apply(), which is already how C functions and efuns work)
Using malloc/free as in cc0851de980d46e85e514657d8d9a484fe8be2eb is certainly much faster than the old worst case behavior. I would guess the new block allocator is faster than malloc/free, so I propose to overwrite cc0851de980d46e85e514657d8d9a484fe8be2eb by this. Any objections?
To clarify somewhat: No, I do not really object. It's just that I do not entirely see the need for returning the small amount of memory represented by pike-frames to the system, and allocating new ones while running. :)
Anyway, it's not entirely accidental that there is now a Binary Tree testcase in the performance testsuite, although that one also tests the fact that it is rather slow to creata a clone of a class from itself.
That is:
class Tree { Tree left, right; create() { left = Tree(); right = Tree() } }
is slower than
class Tree { Tree left, right; create() { left = this_program(); right = this_program(); } }
which is significantly slower than
class Tree { Tree left, right; }
void create_tree() { Tree x = res; x->left = Tree(); x->right = Tree(); }
Yes, I see. I will merge the new block alloc and keep the new pike_frame code. I have probably not read it properly, then. I tried to benchmark, but current 7.9 doesnt work for me, right now.
arne
On Wed, 12 Jun 2013, Per Hedbor () @ Pike (-) developers forum wrote:
Using malloc/free as in cc0851de980d46e85e514657d8d9a484fe8be2eb is certainly much faster than the old worst case behavior. I would guess the new block allocator is faster than malloc/free, so I propose to overwrite cc0851de980d46e85e514657d8d9a484fe8be2eb by this. Any objections?
I am not entirely certain it is actually faster than the new pike_frame code.
That one actually never calls free, and only calls malloc every 8Kb of pikeframes. It is almost impossible to write something that is faster, the only way is to remove the accounting (num_*++ in the allocator and the deallocator, which about doubles the speed)
There is of course tradeoffs. It, as mentioned, never returns memory to the system.
A better solution for the future would probably be to keep the frames on the stack, and then duplicate them when needed for the scope code, (and by necessity change the calling convention for pike functions from addr=mega_apply(...);*addr() to mega_apply(), which is already how C functions and efuns work)
-- Per Hedbor
Yes, I see. I will merge the new block alloc and keep the new pike_frame code. I have probably not read it properly, then.
... current 7.9 doesnt work for me, right now.
It does not work with DEBUG enabled currently, I will commit a fix for that shortly.
If you have some other issue (say, related to files->_Stdio) do
git clean -f # (NOTE: Will delete uncommited files) rm -fr src/modules/files rm -fr build (cd src;./run_autoconfig) make MAKE_PARALLEL='-j5'
I tried to benchmark
Just some numbers from my runs:
Before any of the recent fixes: 7.8: Binary Trees............... 320211/s (160Mb) 7.9.5: Binary Trees............... 409130/s (134Mb) yesterday: Binary Trees............... 508800/s (132Mb)
The difference was much bigger if binary tree:s test with less deep but more trees, this time is dominated by really_free_object and gc.
.. which brings us to ..
today: Binary Trees............... 936518/s (43Mb) (3x faster than 7.8!)
Not bad. :)
Any idea why it's only using about 30% of the RAM? :)
Binary Trees............... 936518/s (43Mb) (3x faster than 7.8!)
For reference, simply replacing the blockalloc with malloc/free (something I did to try it out yesterday) gives this:
Binary Trees............... 818800/s (122Mb)
So the new block-alloc is at least fater (and uses significantly less memory than) malloc. :)
Very nice!
I did compile a fresh 7.9 and then copied all the bench files back to an old 7.8 and ran both to compare. Although most things were noticeably improved there are a handful of significant regressions:
7.8: call_out handling.......... 0.097s 0.005s (25) (200123/s) call_out handling (with id) 0.094s 0.001s (25) (693078/s) Compile.................... 0.457s 0.365s (11) (66136 lines/s) Compile & Exec............. 0.309s 0.216s (17) (2788662 lines/s) Tag removal using a loop... 1.003s 0.911s (5) (307474 tags/s) Tag removal u. Parser.HTML. 0.722s 0.629s (7) (2668838 tags/s) Tag removal using sscanf... 0.364s 0.272s (14) (411657 tags/s)
7.9: call_out handling.......... 0.006s 0.005s (10) (1986/s) call_out handling (with id) 0.002s 0.001s (12) (2230/s) Compile.................... 0.458s 0.429s (4) (56321 lines/s) Compile & Exec............. 0.249s 0.231s (5) (2607873 lines/s) Tag removal using a loop... 1.997s 1.924s (2) (145514 tags/s) Tag removal u. Parser.HTML. 0.720s 0.673s (3) (2496792 tags/s) Tag removal using sscanf... 0.394s 0.364s (4) (307338 tags/s)
Anyone else that see the same problem areas?
Nevertheless, some real-world benchmarks (using three different XSLT tests) show really impressive gains!
7.8 7.9 -------- -------- test1: 196 ms 163 ms test2: 164 ms 136 ms test3: 941 ms 776 ms
I did compile a fresh 7.9 and then copied all the bench files back to an old 7.8 and ran both to compare. Although most things were noticeably improved there are a handful of significant regressions:
I have not investigated call_out or compile (I will tomorrow), but as for the tag removal tests:
The reason was the opcode LOCAL + LOCAL. It did not do the trick where it first free:d the old value before calculating the new one, so the various string/array += optimizations were not triggered.
The tag removal tests build the string (basically the input but will all tags removed) using += (so it is, in a way, as much of a string addition as tag removal test).
A fix has been commited (I also did the same fix for ++ and --, which might be considered overkill, since they seldom operate on non-numbers, but..)
I also made string allocation significantly faster, which further speed up the test.
Cool, did indeed fix those benchmarks. On the other hand two of my three real-world XSLT tests dropped 2% (and one gained as much), but perhaps that's just differences in CPU cache use or alignment or similar.
Cool, did indeed fix those benchmarks. On the other hand two of my three real-world XSLT tests dropped 2% (and one gained as much), but perhaps that's just differences in CPU cache use or alignment or similar.
If the tests use widestring it might very well be becase the longest wide short string is now half as long as it used to be (counted in characters), and thus more strings are allocated using malloc/free instead of the short string allocator.
Good point. I finally had a chance to test again and it's true that wide strings were involved in one of the slower cases.
On Thu, 13 Jun 2013, Per Hedbor () @ Pike (-) developers forum wrote:
Yes, I see. I will merge the new block alloc and keep the new pike_frame code. I have probably not read it properly, then.
... current 7.9 doesnt work for me, right now.
It does not work with DEBUG enabled currently, I will commit a fix for that shortly.
If you have some other issue (say, related to files->_Stdio) do
git clean -f # (NOTE: Will delete uncommited files) rm -fr src/modules/files rm -fr build (cd src;./run_autoconfig) make MAKE_PARALLEL='-j5'
My problem was that some modules were not being recompiled after the type renumbering. And of course this lead to all kinds of weird errors.
I tried to benchmark
Just some numbers from my runs:
Before any of the recent fixes: 7.8: Binary Trees............... 320211/s (160Mb) 7.9.5: Binary Trees............... 409130/s (134Mb) yesterday: Binary Trees............... 508800/s (132Mb)
The difference was much bigger if binary tree:s test with less deep but more trees, this time is dominated by really_free_object and gc.
.. which brings us to ..
today: Binary Trees............... 936518/s (43Mb) (3x faster than 7.8!)
Not bad. :)
Any idea why it's only using about 30% of the RAM? :)
Hm, that seems like a mismeasurement. The shootout always measured ram usage after the benchmark has been run. Depending on when memory is actually deallocated, it might show very different numbers. If I modify the BinaryTree test and read statm or use _memory_usage() during the test, there is no big difference between versions. Without investigating it, I guess that one reason why memory can be freed much more easily is, that the new block allocator will double page sizes whenever a new page is needed. It will therefore very quickly start using mmap instead of sbrk. Of course, if all blocks are freed, the mapped page can be actually reclaimed much more easily; lowering sbrk is often not possible.
arne
Progress report:
The consensus was to attempt to push for a 8.0 release, given the large amount of changes since 7.0, and to make that release with the current feature set. The task list for that on a high level is
Fork a 8.0 and next branch (9.0?) soonish (within a week?)
Fix compilation issues. This basically boils down to a requirement on version 4 of GCC and to remove redundant configure tests that today prevents correct behavior.
Fix the testsuite. There are not very many issues that needs to be fixed, but they have been around for a while, suggesting that they may be tricky to fix, or at least requires a time investment from someone knowledgable. We have a few type system issues, a scope issue with inherit, a test case that breaks the pike stack and broken dumped file system.
There are quite a few CritBit-related failures right now. Could they be related to the renumbering of PIKE_T_*?
Valgrind currently takes issue with a few of the tests. Check if these are real and fix them.
We have a few documentation issues reported by the autodoc extractor. These should be fixed.
Test that exported source and exported builds compile/installs/work.
I believe that most of the decode_value-related problems have now been fixed. Remaining to do is support for encoding of programs using variants (the decoder should already be ok).
Set up Xenofarm to build Pike 8.
Check if there are obvious performance regressions from 7.8, and address them if that appears to be easy.
And everyones favorite; Write a changelog.
For the next version of Pike we want to be more explicit about supported operating systems. Linux, Solaris, MacOS, Windows and, if Marcus supports it, AmigaOS.
Remove configure and transition to a profile based system where a pike script generates header files.
Remove the compat system, as it creates a lot of maintenance overhead, ugly code and is of dubious value in practice.
I don't see much of a problem with it, albeit some of the more obscure bug-compat should probably be removed.
Consider removing the less used C modules to cut down on dependenices and compilation time. DVB, Mird, spider and PDF were mentioned as examples.
Remove the Pike security system. This is not maintained and would simplify the code.
Remove the bundle system. All the bundle dependencies are now available as packages and most are already installed by default.
Done.
Make support for bignums mandatory, to remove all ifdef bignum code.
Remove support for old systems (e.g. outdated code in port.c)
Remove short svalue support.
Remove Fd_ref from Stdio.File, currently implemented under STDIO_DIRECT_FD define.
Implement typed constants.
Implement "operator" as an alternative syntax for "`".
Replace block alloc with a single (and better) implementation.
Done.
Remove local calendar locale configuration support.
Split mega apply into a pike and a C version to speed up calls between pike functions.
Done.
Remove the mark stack.
Renumber the types so that PIKE_T_INT becomes 0.
Done.
On Sat, 15 Jun 2013, Henrik Grubbstr�m (Lysator) @ Pike (-) developers forum wrote:
There are quite a few CritBit-related failures right now. Could they be related to the renumbering of PIKE_T_*?
I think that is related to the stack renumbering. But I did not bisect and also didnt investigate.
arne
I meant reordering... well, anyway.
On Sat, 15 Jun 2013, Arne Goedeke wrote:
On Sat, 15 Jun 2013, Henrik Grubbstr�m (Lysator) @ Pike (-) developers forum wrote:
There are quite a few CritBit-related failures right now. Could they be related to the renumbering of PIKE_T_*?
I think that is related to the stack renumbering. But I did not bisect and also didnt investigate.
arne
On Sat, 15 Jun 2013, Henrik Grubbström (Lysator) @ Pike (-) developers forum wrote:
There are quite a few CritBit-related failures right now. Could they be related to the renumbering of PIKE_T_*?
I think that is related to the stack renumbering. But I did not bisect and also didnt investigate.
I believe that I've found the cause:
In serveral places in the CritBit module (eg the macro tree_header.H:CB_CHECK_KEY():
| #define CB_CHECK_KEY(svalue, fun, num) do { \ | CB_TRANSFORM_KEY(svalue); \ | if (!((svalue)->type & T_KEY)) \ | SIMPLE_BAD_ARG_ERROR(fun, (num), STRFY(key_ptype)); \ | } while(0)
) the type checking for svalues is broken. Note that the type check for the svalue above for some reason uses the &-operator. When T_KEY == PIKE_T_INT == 0 these tests will always break hard.
Fixing the above set of bugs reduced the number of CritBit testsuite failures to 4.
Note that the old code would happily accept floats instead of ints (not too bad), and multisets, objects, functions, programs or types (ie everything except arrays, ints or floats) instead of strings (which is worse, since it is likely to follow broken pointers).
Hm, I guess what we intended to do there is use BIT_* instead of PIKE_T_*. There is for instance
#cmod_define T_KEY (PIKE_T_FLOAT|PIKE_T_INT)
...
On Sat, 15 Jun 2013, Henrik Grubbstr�m (Lysator) @ Pike (-) developers forum wrote:
Fixing the above set of bugs reduced the number of CritBit testsuite failures to 4.
Oops, I had missed one place. Now it looks like the testsuite passes.
Yes, I agree. I will fix it the way it was originally intended.
On Sat, 15 Jun 2013, Henrik Grubbstr�m (Lysator) @ Pike (-) developers forum wrote:
Hm, I guess what we intended to do there is use BIT_* instead of PIKE_T_*. There is for instance
#cmod_define T_KEY (PIKE_T_FLOAT|PIKE_T_INT)
Well, that's broken for sure (both now and before).
- Test that exported source and exported builds compile/installs/work.
I believe that most of the decode_value-related problems have now been fixed. Remaining to do is support for encoding of programs using variants (the decoder should already be ok).
Implemented support for dumping of variant functions. I've also added a first use of variants by using them to implement getenv() in the master.
Hm. I think I would like variant to have to be present on all the functions when there are mutliple, and not at all when there is one. I think it makes it clearer from a dokumentation point of view. Also, I don't like when the order of definitions in the source matter.
pike-devel@lists.lysator.liu.se