It seems that the default configuration of GMP, with WANT_TMP_ALLOCA, allocates arbitrarily large temporaries on the stack. For example mpn_tdiv_qr allocates temporary storage proportional to the size of the input.
Fredrik Hübinette (Pike author) reports that his pike program that calculates a million decimals of Pi crashes in gmp with a default configuration of gmp-4.1.2, but works fine if gmp is reconfigured with --enable-alloca=malloc-reentrant. The problem is made worse by linux threads (and perhaps threads on other systems as well), which limits stack space to only 4 MB. (See http://fredrik.hubbe.net/hacker/viewtopic.php?t=26).
Perhaps it would be a good idea to define TMP_ALLOC as something like
#define TMP_ALLOC(size) \ ((size) < PAGESIZE*2) ? alloca((size)) : heap_alloc((size), &tmp_marker)
Then for small inputs, we get only a single comparison of extra overhead, while for large inputs, the allocation overhead should be small compared to the cost of the actual computation.
Due to the way the "red zone" stack extension mechanism works, I don't think it's reliable to allocate more than a few pages of data on the stack at a time. If you allocate too many pages you may get a segmentation violation of you access the pages in an unfortunate order.
Regards, /Niels
pike-devel@lists.lysator.liu.se