Running the production server in gdb is going to be tricky, so this is not something I'm likely to try first (unless you say it helps tremendously in locating the problem).
It'd only be applicable for programs allocated early when the order still is deterministic. E.g. 65540 which I was looking at is the fourth program with a dynamic id, and that's so early it's probably the same in all 7.8's. I get the trampoline program for it in my pike.
That's kinda interesting, but I see now the program ids are wildly different (12 - String.Buffer, 65540 - trampoline, 65596 - bignum, 66124, 66233, 66463, 595065) so the theory that it'd be some specific structure is busted.
I'll see what I can do. I have over 100GB of spare room on a logpartition, so if this doesn't slow down the server too much, it might be doable.
I believe at least GC_VERBOSE would cause enough load and I/O to be a significant slowdown even when logging to local disk, but maybe some custom made terse debug would cut it.
/.../ looking at the current trace, it seems like this intel is already lost at this point?
Yes, because the objects are destructed.
Hmm, considering the debug check at svalue.c:2316 it must happen while the object is in the queue, and that should be easy to check. Here's a patch that strengthens those debug checks. Could you please try it? Be prepared to back it out quickly though, because I'm not entirely sure it's correct (that o->next != o condition is suspicious).
Index: src/object.c =================================================================== RCS file: /pike/data/cvsroot/Pike/7.8/src/object.c,v retrieving revision 1.305 diff -u -r1.305 object.c --- src/object.c 9 Feb 2010 12:30:25 -0000 1.305 +++ src/object.c 1 Mar 2010 20:51:20 -0000 @@ -807,7 +807,11 @@ SET_ONERROR(uwp, fatal_on_error, "Shouldn't get an exception in destruct().\n"); if(d_flag > 20) do_debug(); + + if (Pike_in_gc >= GC_PASS_PRETOUCH && Pike_in_gc < GC_PASS_FREE) + gc_fatal (o, 1, "Destructing objects is not allowed inside the gc.\n"); #endif + #ifdef GC_VERBOSE if (Pike_in_gc > GC_PASS_PREPARE) { fprintf(stderr, "| Destructing %p with %d refs", o, o->refs); @@ -1020,8 +1024,8 @@ #endif Pike_fatal("Object got %d references in schedule_really_free_object().\n", o->refs); } - if (Pike_in_gc > GC_PASS_PREPARE && Pike_in_gc < GC_PASS_FREE && o->next != o) - Pike_fatal("Freeing objects is not allowed inside the gc.\n"); + if (Pike_in_gc > GC_PASS_PREPARE && Pike_in_gc < GC_PASS_FREE /* && o->next != o*/) + gc_fatal(o, 0, "Freeing objects is not allowed inside the gc.\n"); #endif
debug_malloc_touch(o);