The real question is why this code is there in the first place. Isn't it an order of magnitude slower than doing it in Pike? I sent a replacement C function to the caudium list (?) a few years back.
Anyways, isn't tmp going to underflow as you push it (not gaining reference) and then pop it through calling f_replace (loosing one reference)? Perhaps ref_push_string(tmp) works better?