7.8 alpha 2

List overview All Threads
Download

newer

older

master()->add_predefine()

RFC int2string optimisation

Mirar ＠ Pike developers forum

14 Aug 2008 14 Aug '08

4:25 a.m.

That doesn't break hilfe either.

Only that exact order breaks:

| > string s = Stdio.read_bytes("sugar.jpg"); | > object i = Image.JPEG.decode(s); | >> i; | (1) Result: Image.Image( 300 x 300 /* 263.7Kb */) | > Image.JPEG.encode(i); | *** glibc detected *** /usr/local/bin/pike: double free or corruption (out): 0x0000000000b52860 *** | Error while mapping shared library sections: | ¹=ãÈ¬àx¬<5`Ó 3Ú9ÿ: No such file or directory. | ======= Backtrace: ========= | /lib/libc.so.6[0x2b5197a4f08a] | /lib/libc.so.6(cfree+0x8c)[0x2b5197a52c1c] | ...

Skipping the "i;" doesn't lead to a fault. Replacing the "i;" with werror("%O\n",i); stops it from breaking as well.

This bug goes away if you look at it the wrong way. :P

Show replies by date

Jonas Walld�n ＠ Pike developers forum

14 Aug 14 Aug

4:30 a.m.

"i;" isn't necessary in my setup. I just added it to verify that I loaded my test data correctly.

I can't tell whether your crash is related to the problem I'm investigating. If you suspect duplicate symbols for the JPEG lib you can try to move _Image_TIFF.so away to avoid getting the second copy loaded.

Martin Stjernholm, Roxen IS ＠ Pike developers forum

4:45 a.m.

Valgrind isn't of any help?

Valgrind gives a lot of false alarms in the Image module on 64 bit architectures, though. The problem is that gcc can generate a 64 bit read when rgb_group structs are read, and if that happens near the end of a malloced block then valgrind complains about reading outside addressable memory. I've got some half-baked patches to pad the malloced blocks more when --with-valgrind is used.

Mirar ＠ Pike developers forum

4:50 a.m.

I did try valgrind before, but just compiling for valgrind removed the crash - at least I couldn't trigger it anymore.

Martin Stjernholm, Roxen IS ＠ Pike developers forum

5:55 a.m.

How tiresome. Have you tried compiling without valgrind support and run it with valgrind anyway? You'll have to fix ignores for all the false alarms then, though..

Per Hedbor () ＠ Pike (-) developers forum

8:50 a.m.

That seems like the GTK2-module leak I'm observing (_probably_ related to the list/tree widget). Running pike in valgrind or with dmalloc removes it totally...

Mirar ＠ Pike developers forum

11:40 a.m.

Yep. Typically hard to trace stuff. :p

The only suspicious output I get from valgrind seems to be this:

==29136== Invalid write of size 1 ==29136== at 0x7EF175A: (within /usr/lib/libjpeg.so.62.0.0) ==29136== by 0x7EEEFC9: (within /usr/lib/libjpeg.so.62.0.0) ==29136== by 0x7EEDEF5: (within /usr/lib/libjpeg.so.62.0.0) ==29136== by 0x7EEADFE: jpeg_write_scanlines (in /usr/lib/libjpeg.so.62.0.0) ==29136== by 0x7CE285F: image_jpeg_encode (image_jpeg.c:912) ==29136== by 0x434907: low_mega_apply (apply_low.h:225) ==29136== by 0x4375B3: eval_instruction (interpret_functions.h:2066) ==29136== by 0x440CAA: catching_eval_instruction (interpret.c:2227) ==29136== by 0x440177: eval_instruction (interpret_functions.h:1287) ==29136== by 0x440D9F: mega_apply (interpret.c:2197) ==29136== by 0x4DBF37: call_pike_initializers (object.c:337) ==29136== by 0x4DEA0B: parent_clone_object (object.c:420) ==29136== Address 0x6199cf8 is 0 bytes after a block of size 8,192 alloc'd ==29136== at 0x4C22FAB: malloc (vg_replace_malloc.c:207) ==29136== by 0x7CDF7AE: my_init_destination (image_jpeg.c:249) ==29136== by 0x7EEAF8C: jpeg_start_compress (in /usr/lib/libjpeg.so.62.0.0) ==29136== by 0x7CE25AA: image_jpeg_encode (image_jpeg.c:880) ==29136== by 0x434907: low_mega_apply (apply_low.h:225) ==29136== by 0x4375B3: eval_instruction (interpret_functions.h:2066) ==29136== by 0x440CAA: catching_eval_instruction (interpret.c:2227) ==29136== by 0x440177: eval_instruction (interpret_functions.h:1287) ==29136== by 0x440D9F: mega_apply (interpret.c:2197) ==29136== by 0x4DBF37: call_pike_initializers (object.c:337) ==29136== by 0x4DEA0B: parent_clone_object (object.c:420) ==29136== by 0x43546A: low_mega_apply (apply_low.h:238)

a few of those, then:

==29136== Invalid write of size 1 ==29136== at 0x7EF1588: (within /usr/lib/libjpeg.so.62.0.0) ==29136== by 0x7EEEFC9: (within /usr/lib/libjpeg.so.62.0.0) ==29136== by 0x7EEDEF5: (within /usr/lib/libjpeg.so.62.0.0) ==29136== by 0x7EEADFE: jpeg_write_scanlines (in /usr/lib/libjpeg.so.62.0.0) ==29136== by 0x7CE285F: image_jpeg_encode (image_jpeg.c:912) ==29136== by 0x434907: low_mega_apply (apply_low.h:225) ==29136== by 0x4375B3: eval_instruction (interpret_functions.h:2066) ==29136== by 0x440CAA: catching_eval_instruction (interpret.c:2227) ==29136== by 0x440177: eval_instruction (interpret_functions.h:1287) ==29136== by 0x440D9F: mega_apply (interpret.c:2197) ==29136== by 0x4DBF37: call_pike_initializers (object.c:337) ==29136== by 0x4DEA0B: parent_clone_object (object.c:420) ==29136== Address 0x619a30a is 674 bytes inside a block of size 12,960 free'd ==29136== at 0x4C22B2E: free (vg_replace_malloc.c:323) ==29136== by 0x4D41A2: really_free_mapping (mapping.c:277) ==29136== by 0x476CE9: free_decode_data (encode.c:4795) ==29136== by 0x482BCF: f_decode_value (encode.c:4975) ==29136== by 0x436E6F: eval_instruction (interpret_functions.h:2301) ==29136== by 0x440CAA: catching_eval_instruction (interpret.c:2227) ==29136== by 0x440177: eval_instruction (interpret_functions.h:1287) ==29136== by 0x440D9F: mega_apply (interpret.c:2197) ==29136== by 0x4E1664: object_index_no_free (object.c:1373) ==29136== by 0x4384F2: eval_instruction (interpret_functions.h:1803) ==29136== by 0x440D9F: mega_apply (interpret.c:2197) ==29136== by 0x4E1664: object_index_no_free (object.c:1373) --29136-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --29136-- si_code=80; Faulting address: 0x0; sp: 0x403469E40

Any ideas? Is Pike messing up the alloc table? :p

Jonas Walld�n ＠ Pike developers forum

noon

Hmm, the jpeg_write_scanlines() call is made inside a THREADS_ALLOW loop. Are you running many JPEG operations in parallel? Maybe the internals of that library isn't thread-safe.

Mirar ＠ Pike developers forum

12:05 p.m.

As far as I can tell from my hilfe input, I'm only running one. ;)

The internals of the library are given from the programs using it, as far as I can tell. You feed it a superstructure (struct jpeg_compress_struct).

Increasing the default buffer size (it's supposed to ask for more if needed) at least stopped it from triggering the bug. :p

-#define DEFAULT_BUF_SIZE 8192 +#define DEFAULT_BUF_SIZE 81920

Probably not the correct solution though.

Jonas Walld�n ＠ Pike developers forum

2 p.m.

Yeah, a threading bug sounds less likely.

Anyway, the first "is 0 bytes after a block" sounds like what Mast described earlier and pretty harmless compared to the last one. Google tells me one can run gdb and valgrind together; try stopping at the last error, focus on frame for image_jpeg_encode and see what pointers gets passed to jpeg_write_scanlines. It would be interesting to see if they are reasonable blocks that can be traced to a struct *image or if the Pike internals are messed up.

If they are ok I'd suspect the library itself (especially if you say that most images can't trigger the bug), and the next step would perhaps be to compile your own with debug symbols and optimizations off.

Martin Stjernholm, Roxen IS ＠ Pike developers forum

2:15 p.m.

No, it's not the case I described. That was only reading a bit past the end. In this case it writes, and that is worse.

Jonas Walld�n ＠ Pike developers forum

2:40 p.m.

I just realized that too, but for another reason. In your case it was about a 64-bit access that spilled over a logical boundary (but presumably within the rounded-up size used by malloc) but here it's a single byte being written past a power-of-2 boundary.

Another important clue is that solving the first problems also fixed the fatal crash later. Apparently those tiny writes 1 bytes off are enough to corrupt the malloc structures.

Since the #define and corresponding malloc() is in the Pike source we might be able to call malloc(DEFAULT_BUF_SIZE + 17) (and similar for realloc), but who knows if for other input libjpeg will access even greater offsets?

Jonas Walld�n ＠ Pike developers forum

22 Aug 22 Aug

6:20 a.m.

I've now tracked down and fixed the bug. Ironically it's Mirar that caused it (granted a long time ago) by "#define unsigned int size_t" which isn't valid on a 64-bit machine. I believe it could have caused overwriting of as much memory as the resulting JPEG image occupied outside of the initial buffer size.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

6:25 a.m.

...

[...] "#define unsigned int size_t" which isn't valid on a 64-bit machine. [...]

It's "valid" on ILP64, but not on LP64.

Mirar ＠ Pike developers forum

6:25 a.m.

Great. Did I do that before the 64 bit pointer era, at least? :)

Peter Bortas ＠ Pike developers forum

6:40 a.m.

Nice.

Jonas Walld�n ＠ Pike developers forum

6:50 a.m.

But then I started my Roxen 5.0 to verify and get this nice present:

Post-padding overwritten for block at 0x108e8fa20 (size 801)! **Block: 0x108e8fa20 Type: string Refs: 1 **size_shift: 0, len: 768, hash: 25c37b5a76df6cc5 ** "

$$#'''((("... Stack at allocation: | 0 pike 0x0000000000153997 debug_malloc + 119 | 0 pike 0x0000000000153fad debug_xalloc + 29 | 0 pike 0x00000000001c8879 debug_begin_shared_string + 89 | 0 _Image_GIF.so 0x00000000067e67ba image_gif_header_block + 730 | 0 _Image_GIF.so 0x00000000067e9a12 _image_gif_encode + 2546 | 0 pike 0x000000000001d9dd low_mega_apply + 5053 | 0 pike 0x0000000000041bbc low_mega_apply + 152988 | 0 pike 0x00000000000406cc low_mega_apply + 147628 | 0 pike 0x000000000004d135 low_mega_apply + 199445 | 0 pike 0x000000000004f295 mega_apply + 501 | 0 pike 0x00000000001d9960 new_thread_func + 944 | 0 libSystem.B.dylib 0x0000000081400913 _pthread_start + 316 | 0 libSystem.B.dylib 0x00000000814007d5 thread_start + 13 Locations that handled 0x108e8fa20: (gc generation: 2/2 gc pass: 0/0) *** /home/jonasw/pike/7.8/src/stralloc.c:628 xalloc (1 times) !*! *** /home/jonasw/pike/7.8/src/pike_memory.c:287 malloc (1 times) !*! *** /home/jonasw/pike/7.8/src/modules/_Image_GIF/image_gif.c:282 (1 times) !*! ******************* : Start script terminating. : Shutting down MySQL.. : Start script terminated.

Not the best way to end the week... :-(

Mirar ＠ Pike developers forum

7:30 a.m.

Speaking of which,

[http://pike.ida.liu.se/generated/pikefarm/7.8/46_46/verifylog.txt] | Doing tests in testsuite (11196 tests) | | test: failed to load "/home/[...]/pike/7.8.20/lib/modules/GSSAPI.so": load_module("/home/[...]/pike/7.8.20/lib/modules/GSSAPI.so") failed: libgssapi_krb5.so.2: failed to map segment from shared object: Cannot allocate memory | | | test: failed to load "/home/[...]/pike/7.8.20/lib/modules/_Image_JPEG.so": load_module("/home/[...]/pike/7.8.20/lib/modules/_Image_JPEG.so") failed: /home/[...]/pike/7.8.20/lib/modules/_Image_JPEG.so: failed to map segment from shared object: Cannot allocate memory | | ... | | test: failed to load "/home/[...]/pike/7.8.20/lib/modules/___GTK2.so": load_module("/home/[...]/pike/7.8.20/lib/modules/___GTK2.so") failed: /home/[...]/pike/7.8.20/lib/modules/___GTK2.so: failed to map segment from shared object: Cannot allocate memory | | Fatal: out of memory.

Doesn't seem very good. I didn't see this when I ran make verify manually. How do I debug it?

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

7:40 a.m.

xenofarm/client.sh sets a couple of limits. Try running verify with the same limits (data segment and virtual memory size should be the relevant ones).

Mirar ＠ Pike developers forum

7:40 a.m.

Hm. Is it safe to assume the test pike will not grow over 210MB?

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

7:45 a.m.

I have set 2100MB for the virtual mem limit.

Jonas Walld�n ＠ Pike developers forum

8:40 a.m.

Ok, found and fixed that one as well. But of course fate deals me yet another one:

#0 debug_fatal (fmt=0x10039ae30 "really_free_memloc got invalid pointer %p\n") at /home/jonasw/pike/7.8/src/error.c:632 #1 0x000000010014df83 in really_free_memloc (d=0x114727ac0) at /home/jonasw/pike/7.8/src/pike_memory.c:1358 #2 0x000000010014fa88 in really_free_memhdr (d=0x107d4a940) at /home/jonasw/pike/7.8/src/pike_memory.c:1608 #3 0x000000010014fdc7 in remove_memhdr (ptr=<value temporarily unavailable, due to optimizations>) at /home/jonasw/pike/7.8/src/pike_memory.c:1608 #4 0x0000000100151eac in dmalloc_unregister (p=0x107d4a940, already_gone=-1626928576) at /home/jonasw/pike/7.8/src/pike_memory.c:2089 #5 0x0000000100017f04 in alloc_catch_context () at /home/jonasw/pike/7.8/src/interpret.c:1082 #6 0x0000000100046485 in eval_instruction_without_debug (pc=0x1005a7800 "") at interpret_functions.h:1287 [...]

Anyone else using dmalloc and stressing the 7.8 code?

Martin Stjernholm, Roxen IS ＠ Pike developers forum

23 Aug 23 Aug

8:55 p.m.

Yes, I do that fairly regularly. Not in the last week, though.

Jonas Walld�n ＠ Pike developers forum

29 Aug 29 Aug

4:40 a.m.

Time for a follow-up. This bug seems most likely caused by dmalloc itself not being threadsafe in its handling of internal structures. A hacked version that Grubba and I put together got rid of the problem temporarily but I leave it to the dmalloc experts to develop a long-term solution.

6165

Age (days ago)

6180

Last active (days ago)

pike-devel@lists.lysator.liu.se

23 comments

6 participants

tags (0)

participants (6)

Jonas Walld�n ＠ Pike developers forum
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum
Martin Stjernholm, Roxen IS ＠ Pike developers forum
Mirar ＠ Pike developers forum
Per Hedbor () ＠ Pike (-) developers forum
Peter Bortas ＠ Pike developers forum