Turns out the single-ref-destructive-update optimizations are more difficult to accomplish, even with a micro-gc. The problem is that refs from the stacks aren't counted.
To explain, let's start with the case where we want to keep the optimization pike currently does:
array a = ({0}); for (int i = 1; i < 1000; i++) a += ({i});
(To make the example simpler, I start out with the array ({0}) instead of the empty array ({}) since the empty array always is shared.) Here pike can be destructive on the array in a since there are no other refs to it, and hence it can overallocate it and append new elements to the end efficiently.
With the micro-gc, we can tell in `+= that there are no refs to the array from anywhere but the stack. Now consider this case:
array a = ({0}); array b = a; for (int i = 1; i < 1000; i++) a += ({i});
After this, one of course expects a = ({0, 1, 2, ...}) and b = ({0}). But since we don't count refs from the stack then we don't see the second ref to the array and might therefore think it's safe to be destructive, but that would make b = ({0, 1, 2, ...}) too.
So to use the single-ref-destructive-update optimizations we have to continue to count refs from the stacks. Or at least the pike svalue stack to cope with the situation above, but there might be situations when the same occurs with refs from the C stack too.
This is a problem, because I'm convinced there is a _big_ performance boost in not counting stack refs. So I wonder, is there a way to cope?
One alternative is to extend the array type to explicitly allow adding elements to the end destructively. It could perhaps look like this:
a[sizeof (a)] = i;
I.e. an indexed assignment to one step past the end. The rationale is that []= already is destructive, and this is the same sort of assignment, even though a new slot has to be created in the process. It doesn't imply that arbitrary elements past the end or before the beginning can be created the same way.
I'd like an extension like that anyway, because destructive appends are useful regardless of the amount of refs (and whether the array is thread local or not). It's also nice to be explicit about it rather than relying on an optimization.
The problem is of course that old pike code that previously was O(n) suddenly becomes O(n^2) until it's updated. It doesn't misbehave though.
Of course, there are many more single-ref-destructive-update optimizations, but this is by far the most common for which there currently is no way to be explicitly destructive.