"Stephen R. van den Berg" srb@cuci.nl wrote:
Martin Stjernholm wrote:
Could you also in, say, this last coredump check the contents of the object struct? Something like this:
(gdb) fr 8 #8 0x081687ec in gc_mark_object_as_referenced (o=0x92985d0) at /data/src/gpike/src/object.c:2035 2035 if(gc_mark(o, T_OBJECT)) { (gdb) p *o $1 = {refs = 0, prog = 0x0, next = 0x9296878, prev = 0x0, program_id = 65596, storage = 0x0} (gdb) p *o->next $2 = {refs = 0, prog = 0x0, next = 0x0, prev = 0x0, program_id = 65596, storage = 0x0}
/.../
It's interesting that prev so consistently is NULL. I can't figure out how that would happen, but I have a few more ideas for some debug info that could be helpful:
Please use a pike fresh from cvs as I've made a few more tweaks there. I don't really expect them to nail this bug, but rather more conclusively rule out some alternatives.
Compile with the GC_MARK_DEBUG define (it's outcommented in gc.h). It shouldn't affect performance much. When you get a fatal, go to the gc_mark_run_queue frame and print out b->entries[e]. The members in_type, in and place may shed some light on how this thing ended up in the queue. If you have the opportunity to debug in a live process, please try
(gdb) call describe(b->entries[e].in)
before you kill it.
Some other things you could get from a core are backtraces for all threads and the values of first_object, got_unlinked_things, objects_to_destruct and destruct_object_evaluator_callback.
You can also compile with GC_DEBUG (now outcommented in gc.c). That adds a number of consistency checks to the gc and makes it moderately slower, but not the rest of pike.
And as always, if you get something then the gdb info together with the stderr output from the same process is valuable.
Thanks.