Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Looks a lot like https://bugzilla.roxen.com/bugzilla/show_bug.cgi?id=5072. Here's a bit of what I wrote in that ticket:
It sure does.
This is from a fresh 7.8? More specifically, one from cvs after Nov 28th, when I added some debug for [bug 5072]? Can you run with a version compiled using --with-rtldebug? It's not that much slower.
The traces I gave you are *with* rtldebug already (you wouldn't be seeing the problem otherwise), and yes, it's from a recent version (like the end of January 2010).
In relation to the suspicions on binary modules, the server where this happens in is compiled without Java support, and it doesn't use any other databases than mysql.
Anyone (mast?) have an idea where to dig for the program_id mentioned?
The id is bigger than PROG_DYNAMIC_ID_START, so it's a program without a fixed id. It's also the fourth program that got registered by low_allocate_program, so it's probably a C program. You could set a breakpoint there and check current_program_id to find out which it is.
Running the production server in gdb is going to be tricky, so this is not something I'm likely to try first (unless you say it helps tremendously in locating the problem).
Or do I need to add more logging to the gc.c file?
What would be useful is debug when gc_mark_enqueue is run - log the pointer (data) and the number of refs (*(INT32 *) data). But it's probably not feasible if your server has a big memory footprint; not even GC_VERBOSE logs that.
I'll see what I can do. I have over 100GB of spare room on a logpartition, so if this doesn't slow down the server too much, it might be doable.
It would probably be interesting to find out when this object was created.
Yes, that could give a clue, but the really valuable info is what kind of things were referencing the object, e.g. some weak mapping as suspected above.
The first clue to that is finding out what this object contained (or was supposed to contain); looking at the current trace, it seems like this intel is already lost at this point?
I think it should be possible to add another field to the object which logs the time of creation. Then the backtrace from that moment would be interesting; anyone have a better idea than storing a backtrace for every object as soon as it is created, yet discarding the backtrace (to conserve space) as soon as the object gets a reference?
--with-dmalloc-c-stack-trace does that, but then you're running with dmalloc which is probably too slow. It should be possible to rip out the location tracking from dmalloc so that it only logs this. Then I it could be fast enough. But afaik there's no ready-made define for that.
I'll look into this.