Hi,
I am a little bit tired trying to find out what is going on --with-rtldebug, since there is no debug info available at all, may be someone has a clue... My build: http://pike.ida.liu.se/development/pikefarm/result.xml?id=507_103&pike=7... When I try to analyze core with gdb:
--snip-- aldem@troll:~/src/pike/cvs/Pike-v7.7-snapshot/build/linux-2.6.8-24.3-smp-i686/post_modules/Shuffler> gdb ../../pike core GNU gdb 6.2.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i586-suse-linux"...Using host libthread_db library "/lib/tls/libthread_db.so.1".
Core was generated by /home/aldem/src/pike/cvs/Pike-v7.7-snapshot/build/linux-2.6.8-24.3-smp-i686/pik'. Program terminated with signal 11, Segmentation fault.
warning: current_sos: Can't read pathname for load map: Input/output error
Reading symbols from /lib/tls/libm.so.6...done. Loaded symbols for /lib/tls/libm.so.6 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/tls/librt.so.1...done. Loaded symbols for /lib/tls/librt.so.1 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/tls/libpthread.so.0...done. Loaded symbols for /lib/tls/libpthread.so.0 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/tls/libc.so.6...done. Loaded symbols for /lib/tls/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 #0 eval_instruction_without_debug (pc=Cannot access memory at address 0x8 ) at interpreter.h:119 119 instr = pc[0]; (gdb) bt #0 eval_instruction_without_debug (pc=Cannot access memory at address 0x8 ) at interpreter.h:119 Cannot access memory at address 0x4 (gdb) --snip--
When I run same command from gdb, exactly same problem (no stack info). Attempt to use valgrind gives: ==24227== Invalid read of size 4 ==24227== at 0x8094F03: eval_instruction_without_debug (interpreter.h:119) ==24227== Address 0x8 is not stack'd, malloc'd or (recently) free'd
...and that's all... No stack info... I am lost... :( Any clues? This is long standing problem and I've no idea what it might be, especially taking into account that 7.6 has no problems... Or should I simply cut it out (rtldebug) and forget about this problem? :) Regards, /Al
See if you can get it to crash inside the gdb, and then run "call gdb_backtraces()" while you still have a process. It will print where it is in your pike program.
You need to compile with --with-valgrind to be able to use valgrind, I've heard. I think it marks the shifts between data and code better then.
On Sat, Nov 13, 2004 at 08:35:00AM +0100, Mirar @ Pike developers forum wrote:
See if you can get it to crash inside the gdb, and then run "call gdb_backtraces()" while you still have a process. It will print where it is in your pike program.
This gives:
--snip-- Starting program: /home/aldem/src/pike/cvs/Pike-v7.7-snapshot/build/linux-2.6.8-24.3-smp-i686/pike -DNOT_INSTALLED -DPRECOMPILED_SEARCH_MORE -m/home/aldem/src/pike/cvs/Pike-v7.7-snapshot/build/linux-2.6.8-24.3-smp-i686/master.pike /home/aldem/src/pike/cvs/Pike-v7.7-snapshot/src/post_modules/Shuffler/make_sources.pike /home/aldem/src/pike/cvs/Pike-v7.7-snapshot/src/post_modules/Shuffler sources.h sources_to_compile [Thread debugging using libthread_db enabled] warning: Unable to set global thread event mask: generic error [New Thread -1209796032 (LWP 24314)]
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1209796032 (LWP 24314)] eval_instruction_without_debug (pc=Cannot access memory at address 0x8 ) at interpreter.h:119 119 instr = pc[0]; (gdb) bt #0 eval_instruction_without_debug (pc=Cannot access memory at address 0x8 ) at interpreter.h:119 Cannot access memory at address 0x4 (gdb) call gdb_backtraces()
THREAD_ID 0xb7e3fa40 (swapped in): /home/aldem/src/pike/cvs/Pike-v7.7-snapshot/build/linux-2.6.8-24.3-smp-i686/master.pike:2372: _main(array[8], array[57]) (gdb) --snip--
Due to some mysterious reason stack is corrupted so heavily that it cannot be used to backtrace anything... I'll try without thread support, but if this problem will disapperat then, well... No clue what to do next... I don't know Pike's internals so good to make a guess where the problem is...
Regards, /Al
THREAD_ID 0xb7e3fa40 (swapped in): /home/aldem/src/pike/cvs/Pike-v7.7-snapshot/build/linux-2.6.8-24.3-smp-i686/master.pike:2372: _main(array[8], array[57])
Due to some mysterious reason stack is corrupted so heavily that it cannot be used to backtrace anything...
Well, the first thread is in master.pike line 2372, so it can't be that bad?
Note that the C stack and the Pike stack are totally different entities, so both are rarely destroyed at the same time.
You might want to compile without --fomit-frame-pointer, in case that optimization is on. It's a sure stack-eater. Also you can try with --without-machine-code to get better stacks.
On Sat, Nov 13, 2004 at 09:50:02AM +0100, Mirar @ Pike developers forum wrote:
You might want to compile without --fomit-frame-pointer, in case that optimization is on. It's a sure stack-eater. Also you can try with --without-machine-code to get better stacks.
Well, the funny thing is that it is compiled with all debugging options enabled - with frame pointers, --with-cdebug, --with-rtldebug, --with-checker, etc... And still, gdb is not able to use backtrace. Finally, in single-step mode, I hit the place:
(gdb) 2213 OPCODE1(F_CALL_BUILTIN, "call builtin", I_UPDATE_ALL, { (gdb)
Program received signal SIGSEGV, Segmentation fault. eval_instruction_without_debug (pc=Cannot access memory at address 0x8 ) at interpreter.h:119 119 instr = pc[0];
and:
(gdb) bt #0 eval_instruction_without_debug (pc=Cannot fetch general-purpose registers for thread -1209796096: generic error ) at interpreter.h:119 Error accessing memory address 0x4: No such process.
The code wasn't (really) designed to aid debugging... A lot of macros, etc... It will take some time to find the reason...
Regards, /Al
Kan you make a minimal test case? I'm sure that would help.
And/or run the code with trace(1) (or trace(999)).
On Sat, Nov 13, 2004 at 10:50:02AM +0100, Mirar @ Pike developers forum wrote:
And/or run the code with trace(1) (or trace(999)).
With -t999:
---snip--- Inter return - master.pike:4590:(9144): mark, call builtin 1 0 - Arg = 29 - master.pike:4590: aggregate_mapping() - master.pike:4590: aggregate_mapping() returns: ([ ]) - master.pike:4590:(9146): push 1 2 0 - master.pike:4590:(9147): dumb return 3 0 Inter return -- ref: inoff=0 idoff=109 flags=1 -- context: prog->id=65593 inlev=0 idlev=0 pi=-1 po=-18 so=0 name=NULL **Block: 0x8466988 Type: program Refs: 13 **Program id: 65593, flags: f, parent id: -1 **Location: /home/aldem/src/pike/cvs/Pike-v7.7-snapshot/build/linux-2.6.8-24.3-smp-i686/master.pike:0 **Identifiers: ** const: name: bt_max_string_len value: 200 ** const: name: out_of_date_warning value: 1 ** var: name: want_warnings rtt: mixed off: 0 ** var: name: compat_major rtt: mixed off: 16 ** var: name: compat_minor rtt: mixed off: 32 ** var: name: show_if_constant_errors rtt: mixed off: 48 ** const: pri name: Builtin value: Segmentation fault (core dumped) ---snip---
And with -t3:
---snip--- - master.pike:2044: indices(_static_modules) - master.pike:2044: indices() returns: ({ /* 5 elements */ "sprintf", "files", "_math", "Builtin", "_system" }) - master.pike:2044:(346): & local 15 1 - master.pike:2044:(348): push 0 17 1 - master.pike:2044:(349): branch 18 1 - master.pike:2044:(469): foreach 18 1 - master.pike:2045:(354): local[local] 18 1 - master.pike:2045:(357): assign local 19 1 - master.pike:2046:(359): ->x 19 1 - master.pike:2048:(361): branch if not zero 19 1 - master.pike:2047:(366): mark & local 18 1 - master.pike:2047:(368): call function 19 2 Segmentation fault (core dumped) ---snip---
So... What next? :)
Regards, /Al
Well, the crash is definately in the master, or at least when the master is trying to call something else.
I would say that it's actually rtldebug that is buggy. Why, I have no clue. But it looks like something reads a null pointer. Is pc NULL perhaps? Add an fprintf(stderr,"pc=%p\n") at the top of eval_instruction, maybe? (interpreter.h)
Now why that happens...
On Sat, Nov 13, 2004 at 11:35:41AM +0100, Mirar @ Pike developers forum wrote:
clue. But it looks like something reads a null pointer. Is pc NULL perhaps? Add an fprintf(stderr,"pc=%p\n") at the top of
It gives:
pc = 0x8501baf pc = 0x8501bb0 pc = 0x8501bb2 make[5]: *** [override] Segmentation fault (core dumped)
And the lines in master.pike where it happens:
if (!val->_module_value) val = val(); // <= Here
Some observations show that val at this time points to builtin sprintf module... But...
Regards, /Al
Okey, so it's not that... print it just before the actual crash? See if you can trace when pc gets an illegal value.
Program received signal SIGSEGV, Segmentation fault. eval_instruction_without_debug (pc=Cannot access memory at address 0x8 ) at interpreter.h:119 119 instr = pc[0];
If it really is at that line...
On Sat, Nov 13, 2004 at 02:10:01PM +0100, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
What happens if you try without machine code?
It is configured --without-machine-code, and it works when there is no --with-rtldebug... I tried to check changes made, but there are too many (6M diff)...
Regards, /Al
And I guess you don't use computed goto either. Since you don't get a backtrace the problem is probably that the (C) stack has been overwritten by the preceding instruction executed by eval_instruction.
I'd try to get a gdb breakpoint as shortly before it happens as possible (maybe the pike function _gdb_breakpoint can be useful) and then set a watchpoint on a suitable range of the stack.
It could also provide a clue to dump the stack raw and see if there's a telltale string or something there.
On Sat, Nov 13, 2004 at 06:50:01PM +0100, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
I'd try to get a gdb breakpoint as shortly before it happens as
I know the place (more or less), but it will take some time, to single step and check everything... Not sure this will help, but anyway... there is some bug... I wonder if this happens on other configurations with rtldebug enabled...
Regards, /Al
If you already have a good breakpoint then it's just a matter of setting a watchpoint on the stack, e.g. on the return address. Do you really need to single step that much?
I run with rtldebug all the time. Haven't seen this.
On Sat, Nov 13, 2004 at 07:15:00PM +0100, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
I run with rtldebug all the time. Haven't seen this.
Perhaps this is because you run it with normal ints... :)
Well, I got it... In f_allocate() builtin there is a call to get_all_args() but int arument was declared as INT32, while format is "%+", I use --with-long-long-ints (just for testing) - so... :)
This kind of errors is extremely difficult to spot... No (known to me) memory profilers do checks for local (stack) variables access...
In Pike 7.6 f_allocate() doesn't use get_all_args(), that's why it is OK there, however, there could be another problem (since size is still declared as INT32).
struct array also uses INT32 for size member, so I am not sure how to proceed - I'll just commit a fix for f_allocate(), but if one day someone will decide to play around with huge values which don't fit in INT32, he will have a problem...
What I don't understand, though - this is why this problem doesn't exist when I don't use --with-rtldebug...
Regards, /Al
Good spotting.
The correct way is of course to get an INT_TYPE and then throw an error if it doesn't fit in an INT32.
If it's more then this spot left (I doubt it, I've run with long long ints for years??!) you might want to expand get_all_args for a 32-bit int target.
Nice work. I've added the missing range check. When memory has grown to the sizes where arrays with more than 2 billion elements actually becomes feasible, we'll change the type for array.size.
It'd be really nice if it was possible to specify type checks for custom format strings, i.e. like the gcc __attribute__ ((format (printf, 1, 2))) feature but with a custom format spec instead of printf. This would also be useful for the new string_builder_sprintf and e.g. Pike_error that uses it. (There are currently many false alarms from gcc since Pike_error still is declared to take a printf style format string.)
Is it possible to add some kind of PIKE_DEBUG-enabled code in get_all_args that detects these kinds of problems?
We've been talking about a noop preprocessor that applies additional compilation checks before. Additional sprintf checks lands in the same area.
Is it possible to add some kind of PIKE_DEBUG-enabled code in get_all_args that detects these kinds of problems?
Not really, no.
We've been talking about a noop preprocessor that applies additional compilation checks before. Additional sprintf checks lands in the same area.
Historically you would use a program like "lint" to check stuff like this, without actually compiling anything. I think making something like that using Parser.C would be the way to go.
Not that I can think of. The best course is probably to contribute a system to specify custom format strings to gcc.
pike-devel@lists.lysator.liu.se