I've studied that refcount paper and some other recent gc algorithms. http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2006/PHD/PHD-2006... combines a number of gc algorithms into something that's a whole lot more efficient than what pike currently uses. Some highlights:
o The reference counts aren't updated for references on the stack. The stacks are scanned when the gc runs instead. This saves many many updates, and it also simplifies C level programming a lot. Only refcounts between things on the heap are counted.
o The refcounts are only updated when the gc runs. This saves a lot of the remaining updates since if a pointer starts with value p_0 and then changes to p_1, then p_2, p_3, ..., and lastly to p_n at the next gc, then only p_0->refs needs to be decremented and p_n->refs needs to be incremented - all the refcount changes for the other things it has pointed to cancel out.
o The above is accomplished by thread local logging, to make the old p_0 value available to the gc at the next run. This means it scales well with many cpu's.
o A generational gc uses refcounting only for old things in the heap. New things, which are typically very short-lived, aren't refcounted at all but instead gc'ed using a mark-and-sweep collector. This is shown to be more effective for short-lived data, and it handles cyclic structures without any extra effort.
o By using refcounting on old data, the gc only need to give attention to refcounts that gets down to zero. This means the heap can scale to any size without affecting the gc run time, as opposed to using a mark-and-sweep collector on the whole heap. The gc time scales only with the amount of _change_ in the heap.
o Cyclic structures in the old refcounted data that is handled incrementally using the fact that a cyclic structure can only occur when a refcounter is decremented to a value greater than zero. Those things can therefore be tracked and cycle checked in the background. The gc uses several different methods to weed out false alarms before doing actual cycle checks.
o The gc runs entirely in its own thread. It only needs to stop the working threads for a very short time to scan stacks etc, and they can be stopped one at a time.
o Since all the things that need gc attention are logged on change, there's no need for the double-linked lists.
All the above taken together puts this gc in a whole different league compared to the one pike currently uses, so it'd be very nice indeed to replace the old one, and it would simplify a lot of things in the multi-cpu scenario. There are however some significant drawbacks:
o Noncyclic things are no longer destructed and freed immediately. In particular, it means code like this no longer would work:
void foo() { Thread.MutexKey my_lock = my_mutex->lock(); ... do some work ... // my_lock falls out of scope here when the function exits, so // the lock is released right away. }
Mutex locks like the above would need something like a synchronized statement to work well (the locks must be known to the compiler so that they can be destructed properly also when exceptions are thrown - using explicit catch{} and rethrow everywhere would be too cumbersome).
There's also code that opens files and sockets etc and expects them to be automatically closed again through this method. (That practice has been shown to be bug prone, though, so in our sources many of those places have been fixed over time.)
Anyway, bottom line is that it would need pike level fixing. I think the compiler can be changed to produce warnings for most occurrences.
And there might be more immediate-destruct tricks out there too..
o The ubiquitous foo->refs would no longer be useful to anyone other than the gc. This means the _refs() efun becomes meaningless. Afaik it's used occasionally for some destruct tricks, but I think weak refs can be used instead in most cases.
o _next(), _prev() and next_object() would also be defunct. It's a minor problem - other methods to let pike code walk the heap can be added.
o The C module interface becomes more seriously incompatible.
For starters, all code that fiddle with refcounts on the stack become meaningless. I think most of it can be taken care of by changing the various macros, e.g. ref_push_string would just be an alias for push_string, and pop_stack would simply decrement Pike_sp.
More problematic is that all code that change pointers in heap objects would have to use new macros/functions. Above all, this includes all C modules that fiddles with pointers in their current storage (e.g. through a THIS macro or similar). Changes would be trivial but widespread, and I can't think of any way to keep source level compatibility with old modules.
So what do you think?