On Tue, Apr 3, 2012 at 12:35 AM, Martin Stjernholm, Roxen IS @ Pike developers forum 10353@lyskom.lysator.liu.se wrote:
We pushed an experimental branch which uses a new block allocator we have been developing. /.../
I guess you already told us at the conference, but what was your motivation for creating it?
The linear lookup in the existing one...
Is std::allocator the same thing as the (g)libc malloc in your graphs?
No, std::allocator is a block allocator used by std::list, std::map and so on. Compared to glibc free/malloc, the std::allocator is quite fast.
I take it you already got it working as a replacement for the old BLOCK_ALLOC, including that quirky heap walker that the old one sports? Do you have any test where they are compared?
We used pike -x benchmark (because results there jitter a lot we averaged ~160 runs, which seems to give useful precision). I will dig up some results from that. However, -x benchmark may not at all be what should be used for meaningful comparison, but unfortunately we did not come up with anything better yet.
/.../ Currently, GJAlloc is requested as a bundle by the CritBit module, which uses its API directly.
Does it add any features?
Slight ones. For the CritBits, we need the allocator in the object storage (block allocators are used tree-locally). Other than that, GJAlloc supports ba_free_all() and ba_destroy().
All other modules use macro definitions which are compatible with the ones from block_allocator.h.
You mean the old block_alloc.h?
Yes, that we meant.
best,
tobi
Benchmarks:
http://fortranland.com/compare_7.9_block_alloc.txt
This is a diff between 177 runs of pike -x benchmark from the 7.9 branch and the same 177 runs with arne/block_alloc. Brackets give precision, +/- shows deviation within the runs.
best,
tobi
And another one with --cleanup-on-exit enabled:
http://fortranland.com/compare_7.9_block_alloc_cleanup_on_exit.txt
best,
tobi
Perhaps running the testsuite would be a good performance test?
I gave that a try. The difference is below 1%. Most tests probably use too few blocks to get into the range where the old ones behaves badly.
On Tue, 3 Apr 2012, Martin Nilsson (Opera Mini - AFK!) @ Pike (-) developers forum wrote:
Perhaps running the testsuite would be a good performance test?
"Tobias S. Josefowitz" t.josefowitz@gmail.com wrote:
I guess you already told us at the conference, but what was your motivation for creating it?
The linear lookup in the existing one...
Just to make it clear, you're talking about the linear lookup to find the right block when structs are freed?
Is std::allocator the same thing as the (g)libc malloc in your graphs?
No, std::allocator is a block allocator used by std::list, std::map and so on. Compared to glibc free/malloc, the std::allocator is quite fast.
It would be interesting to see a comparison with plain old (glibc) malloc, just to put it into perspective.
Does it add any features?
Slight ones. For the CritBits, we need the allocator in the object storage (block allocators are used tree-locally).
So the nodes for each critbit tree are allocated within its own blocks? Sifting through the code just briefly I got the impression the nodes were allocated from a single global pool, considering that CB_NODE_ALLOC() doesn't take any context. Did I miss something?
http://fortranland.com/compare_7.9_block_alloc.txt http://fortranland.com/compare_7.9_block_alloc_cleanup_on_exit.txt
Hmm, do you have any theories to explain the variations seen in those two? Otherwise it seems to me like there's still a lot of noise, and I suspect none of those tests really stress BLOCK_ALLOC in the way where your implementation would show its strength, i.e. freeing structs when there's a large number allocated.
I figured I could add a test for that, but I can't compile CritBit from clean source on the arne/block_alloc branch:
#### Making dynamic: post_modules/CritBit make[5]: *** No rule to make target `/home/mast/Pike/devel/src/post_modules/CritBit/bitvector.h', needed by `inttree.o'. Stop. make[4]: *** [all] Error 2 make[3]: *** [CritBit] Error 1
And of course, a couple of CritBit benchmarks would be welcome - they also ought to show off GJAlloc, right? ;)
Arne Goedeke el@laramies.com wrote:
Does it add any features?
Well, yes. Actually, there is one thing. Since its not all macros it would be possible to have several ones around just for one task and free everything in one go at the end. Something similar to what the gc is doing right now, although there its only one gc at a time, anyway ,)
Right.
And then, it might be even handier to have one which never frees, except right at the end when all pages are discarded.
Possibly, but the old one is quite good at that detail already, isn't it?
Btw, I pushed a few small fixes to be able to compile with GJAlloc in rtldebug mode (when you're developing C modules, I really recommend that mode, btw). I'm not that happy with the condition I put into BLOCK_ALLOC_IN_USE, though. I thought block_allocator.pages would be better, but that doesn't work with BA_USE_MEMALIGN. I hope you can come up with something better.
On Sun, 15 Apr 2012, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Btw, I pushed a few small fixes to be able to compile with GJAlloc in rtldebug mode (when you're developing C modules, I really recommend that mode, btw). I'm not that happy with the condition I put into BLOCK_ALLOC_IN_USE, though. I thought block_allocator.pages would be better, but that doesn't work with BA_USE_MEMALIGN. I hope you can come up with something better.
I think num_pages does well for this purpose. Otherwise one would have to check the several linked list. I am not sure if its worth keeping the BA_USE_MEMALIGN mode. The idea was to be able to skip the hashtable completely when allocator pages are aligned. In or benchmarks it did not really give any improvement and allocating alot of memory with huge alignment might lead to strange fragmentation issues on some systems.
pike-devel@lists.lysator.liu.se