Multi-cpu design

7 Jan 2009


      I've studied that refcount paper and some other recent gc algorithms.
http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2006/PHD/PHD-2006...
combines a number of gc algorithms into something that's a whole lot
more efficient than what pike currently uses. Some highlights:
o  The reference counts aren't updated for references on the stack.
   The stacks are scanned when the gc runs instead. This saves many
   many updates, and it also simplifies C level programming a lot.
   Only refcounts between things on the heap are counted.
o  The refcounts are only updated when the gc runs. This saves a lot
   of the remaining updates since if a pointer starts with value p_0
   and then changes to p_1, then p_2, p_3, ..., and lastly to p_n at
   the next gc, then only p_0->refs needs to be decremented and
   p_n->refs needs to be incremented - all the refcount changes for
   the other things it has pointed to cancel out.
o  The above is accomplished by thread local logging, to make the old
   p_0 value available to the gc at the next run. This means it scales
   well with many cpu's.
o  A generational gc uses refcounting only for old things in the heap.
   New things, which are typically very short-lived, aren't refcounted
   at all but instead gc'ed using a mark-and-sweep collector. This is
   shown to be more effective for short-lived data, and it handles
   cyclic structures without any extra effort.
o  By using refcounting on old data, the gc only need to give
   attention to refcounts that gets down to zero. This means the heap
   can scale to any size without affecting the gc run time, as opposed
   to using a mark-and-sweep collector on the whole heap. The gc time
   scales only with the amount of _change_ in the heap.
o  Cyclic structures in the old refcounted data that is handled
   incrementally using the fact that a cyclic structure can only occur
   when a refcounter is decremented to a value greater than zero.
   Those things can therefore be tracked and cycle checked in the
   background. The gc uses several different methods to weed out false
   alarms before doing actual cycle checks.
o  The gc runs entirely in its own thread. It only needs to stop the
   working threads for a very short time to scan stacks etc, and they
   can be stopped one at a time.
o  Since all the things that need gc attention are logged on change,
   there's no need for the double-linked lists.
All the above taken together puts this gc in a whole different league
compared to the one pike currently uses, so it'd be very nice indeed
to replace the old one, and it would simplify a lot of things in the
multi-cpu scenario. There are however some significant drawbacks:
o  Noncyclic things are no longer destructed and freed immediately. In
   particular, it means code like this no longer would work:
void foo() {
     Thread.MutexKey my_lock = my_mutex->lock();
     ... do some work ...
     // my_lock falls out of scope here when the function exits, so
     // the lock is released right away.
   }
Mutex locks like the above would need something like a synchronized
   statement to work well (the locks must be known to the compiler so
   that they can be destructed properly also when exceptions are
   thrown - using explicit catch{} and rethrow everywhere would be too
   cumbersome).
There's also code that opens files and sockets etc and expects them
   to be automatically closed again through this method. (That
   practice has been shown to be bug prone, though, so in our sources
   many of those places have been fixed over time.)
Anyway, bottom line is that it would need pike level fixing. I
   think the compiler can be changed to produce warnings for most
   occurrences.
And there might be more immediate-destruct tricks out there too..
o  The ubiquitous foo->refs would no longer be useful to anyone other
   than the gc. This means the _refs() efun becomes meaningless. Afaik
   it's used occasionally for some destruct tricks, but I think weak
   refs can be used instead in most cases.
o  _next(), _prev() and next_object() would also be defunct. It's a
   minor problem - other methods to let pike code walk the heap can be
   added.
o  The C module interface becomes more seriously incompatible.
For starters, all code that fiddle with refcounts on the stack
   become meaningless. I think most of it can be taken care of by
   changing the various macros, e.g. ref_push_string would just be an
   alias for push_string, and pop_stack would simply decrement
   Pike_sp.
More problematic is that all code that change pointers in heap
   objects would have to use new macros/functions. Above all, this
   includes all C modules that fiddles with pointers in their current
   storage (e.g. through a THIS macro or similar). Changes would be
   trivial but widespread, and I can't think of any way to keep source
   level compatibility with old modules.
So what do you think?

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Multi-cpu design