int main() { GTK2.setup_gtk(); object btn=GTK2.Button("Raise an exception"); object win=GTK2.Window(0)->add(btn)->show_all(); win->signal_connect("destroy",lambda() {exit(0);}); btn->signal_connect("clicked",lambda() {error("Baboom!\n");}); return -1; }
Run this, click the button, then close the window. Pike will segfault.
Running a --with-debug Pike under gdb shows that the actual segfault occurs in backend_callback (defined in GTK2/source/global.pre), where it calls into the low end main loop for an iteration. Not immediately helpful.
I suspect that something's getting corrupted - maybe in the Pike stack, maybe something gets freed in GTK that should have another reference - when the exception happens. How can I go about tracking this down? Where would I put probes to watch for unexpected changes?
ChrisA
Guessing that since win and btn go out of scope at end of main(), maybe they get cleaned up? I usually set things that need to persist into backend past main as global variables, or at least i have in the past.
On Wednesday, March 2, 2016 3:49 PM, Chris Angelico rosuav@gmail.com wrote:
int main() { GTK2.setup_gtk(); object btn=GTK2.Button("Raise an exception"); object win=GTK2.Window(0)->add(btn)->show_all(); win->signal_connect("destroy",lambda() {exit(0);}); btn->signal_connect("clicked",lambda() {error("Baboom!\n");}); return -1; }
Run this, click the button, then close the window. Pike will segfault.
Running a --with-debug Pike under gdb shows that the actual segfault occurs in backend_callback (defined in GTK2/source/global.pre), where it calls into the low end main loop for an iteration. Not immediately helpful.
I suspect that something's getting corrupted - maybe in the Pike stack, maybe something gets freed in GTK that should have another reference - when the exception happens. How can I go about tracking this down? Where would I put probes to watch for unexpected changes?
ChrisA
You could try to move those to global, see if it still happens. If not, that's probably where the problem lies.
On Wednesday, March 2, 2016 5:06 PM, Lance Dillon riffraff169@yahoo.com wrote:
Guessing that since win and btn go out of scope at end of main(), maybe they get cleaned up? I usually set things that need to persist into backend past main as global variables, or at least i have in the past.
On Wednesday, March 2, 2016 3:49 PM, Chris Angelico rosuav@gmail.com wrote:
int main() { GTK2.setup_gtk(); object btn=GTK2.Button("Raise an exception"); object win=GTK2.Window(0)->add(btn)->show_all(); win->signal_connect("destroy",lambda() {exit(0);}); btn->signal_connect("clicked",lambda() {error("Baboom!\n");}); return -1; }
Run this, click the button, then close the window. Pike will segfault.
Running a --with-debug Pike under gdb shows that the actual segfault occurs in backend_callback (defined in GTK2/source/global.pre), where it calls into the low end main loop for an iteration. Not immediately helpful.
I suspect that something's getting corrupted - maybe in the Pike stack, maybe something gets freed in GTK that should have another reference - when the exception happens. How can I go about tracking this down? Where would I put probes to watch for unexpected changes?
ChrisA
On Thu, Mar 3, 2016 at 9:06 AM, Lance Dillon riffraff169@yahoo.com wrote:
Guessing that since win and btn go out of scope at end of main(), maybe they get cleaned up? I usually set things that need to persist into backend past main as global variables, or at least i have in the past.
That part shouldn't be a problem. In the original code that this is cut down from, those objects were indeed persisted elsewhere; and the GTK signal retains references anyway.
Odd discovery: Now that I'm on my laptop, I can't trigger the segfault. That might mean the corruption's still there but just doesn't happen to crash anything, or it might mean there's a real and significant difference. Both systems are using fairly recent builds of Pike 8.1, and updating the laptop to the very latest didn't start the segfaults. Strange. But it means I can't do the quick and obvious verification you mention until I get home.
ChrisA
On Thu, Mar 3, 2016 at 9:14 AM, Chris Angelico rosuav@gmail.com wrote:
On Thu, Mar 3, 2016 at 9:06 AM, Lance Dillon riffraff169@yahoo.com wrote:
Guessing that since win and btn go out of scope at end of main(), maybe they get cleaned up? I usually set things that need to persist into backend past main as global variables, or at least i have in the past.
That part shouldn't be a problem. In the original code that this is cut down from, those objects were indeed persisted elsewhere; and the GTK signal retains references anyway.
Odd discovery: Now that I'm on my laptop, I can't trigger the segfault. That might mean the corruption's still there but just doesn't happen to crash anything, or it might mean there's a real and significant difference. Both systems are using fairly recent builds of Pike 8.1, and updating the laptop to the very latest didn't start the segfaults. Strange. But it means I can't do the quick and obvious verification you mention until I get home.
ChrisA
Back on my main system, and the tiny tweak of hoisting the declarations to global doesn't change the crash. And compiling the latest Pike 8.1 doesn't change anything, either.
ChrisA
Also happens with me with : [riffraff@hobbes src]$ pike --version Pike v8.0 release 1 Copyright © 1994-2013 Linköping University
compiling 8.1 right now, but that is weird.
On Wednesday, March 2, 2016 8:25 PM, Chris Angelico rosuav@gmail.com wrote:
On Thu, Mar 3, 2016 at 9:14 AM, Chris Angelico rosuav@gmail.com wrote:
On Thu, Mar 3, 2016 at 9:06 AM, Lance Dillon riffraff169@yahoo.com wrote:
Guessing that since win and btn go out of scope at end of main(), maybe they get cleaned up? I usually set things that need to persist into backend past main as global variables, or at least i have in the past.
That part shouldn't be a problem. In the original code that this is cut down from, those objects were indeed persisted elsewhere; and the GTK signal retains references anyway.
Odd discovery: Now that I'm on my laptop, I can't trigger the segfault. That might mean the corruption's still there but just doesn't happen to crash anything, or it might mean there's a real and significant difference. Both systems are using fairly recent builds of Pike 8.1, and updating the laptop to the very latest didn't start the segfaults. Strange. But it means I can't do the quick and obvious verification you mention until I get home.
ChrisA
Back on my main system, and the tiny tweak of hoisting the declarations to global doesn't change the crash. And compiling the latest Pike 8.1 doesn't change anything, either.
ChrisA
However:
int main() { GTK2.setup_gtk(); object btn=GTK2.Button("Raise an exception"); object win=GTK2.Window(0)->add(btn)->show_all(); win->signal_connect("destroy",lambda() {exit(0);},0); btn->signal_connect("clicked",lambda() {error("Baboom!\n");}); return -1; }
Adding another parameter of 0 to signal_connect() fixes it. int signal_connect(string signal, function callback, mixed|void callback_arg, string|void detail,int|void connect_before) {... get_all_args("signal_connect",args,"%s%*%*.%s%d",&a,&tmp1,&tmp2,&detail,&connect_before);
Matches detail: ... if (detail) det=g_quark_try_string(detail); else det=0; id=g_signal_connect_closure_by_id(G_OBJECT(THIS->obj),b->signal_id,det, gc,!connect_before); pgtk2_pop_n_elems(args);
So maybe something with detail not existing, so det gets an invalid quark at 0? Perhaps should change to: if (detail) det=g_quark_try_string(detail);else det=g_quark_try_string(0);
Or: det=g_quark_try_string(detail?detail:0); I say 0 because if detail is 0, then g_quark_try_string succeeds. I believe this might be where the error lies. Could try it and see.
On Wednesday, March 2, 2016 8:37 PM, Lance Dillon riffraff169@yahoo.com wrote:
Also happens with me with : [riffraff@hobbes src]$ pike --version Pike v8.0 release 1 Copyright © 1994-2013 Linköping University
compiling 8.1 right now, but that is weird.
On Wednesday, March 2, 2016 8:25 PM, Chris Angelico rosuav@gmail.com wrote:
On Thu, Mar 3, 2016 at 9:14 AM, Chris Angelico rosuav@gmail.com wrote:
On Thu, Mar 3, 2016 at 9:06 AM, Lance Dillon riffraff169@yahoo.com wrote:
Guessing that since win and btn go out of scope at end of main(), maybe they get cleaned up? I usually set things that need to persist into backend past main as global variables, or at least i have in the past.
That part shouldn't be a problem. In the original code that this is cut down from, those objects were indeed persisted elsewhere; and the GTK signal retains references anyway.
Odd discovery: Now that I'm on my laptop, I can't trigger the segfault. That might mean the corruption's still there but just doesn't happen to crash anything, or it might mean there's a real and significant difference. Both systems are using fairly recent builds of Pike 8.1, and updating the laptop to the very latest didn't start the segfaults. Strange. But it means I can't do the quick and obvious verification you mention until I get home.
ChrisA
Back on my main system, and the tiny tweak of hoisting the declarations to global doesn't change the crash. And compiling the latest Pike 8.1 doesn't change anything, either.
ChrisA
BTW, that is in gobject.pre.
On Wednesday, March 2, 2016 8:45 PM, Lance Dillon riffraff169@yahoo.com wrote:
However:
int main() { GTK2.setup_gtk(); object btn=GTK2.Button("Raise an exception"); object win=GTK2.Window(0)->add(btn)->show_all(); win->signal_connect("destroy",lambda() {exit(0);},0); btn->signal_connect("clicked",lambda() {error("Baboom!\n");}); return -1; }
Adding another parameter of 0 to signal_connect() fixes it. int signal_connect(string signal, function callback, mixed|void callback_arg, string|void detail,int|void connect_before) {... get_all_args("signal_connect",args,"%s%*%*.%s%d",&a,&tmp1,&tmp2,&detail,&connect_before);
Matches detail: ... if (detail) det=g_quark_try_string(detail); else det=0; id=g_signal_connect_closure_by_id(G_OBJECT(THIS->obj),b->signal_id,det, gc,!connect_before); pgtk2_pop_n_elems(args);
So maybe something with detail not existing, so det gets an invalid quark at 0? Perhaps should change to: if (detail) det=g_quark_try_string(detail);else det=g_quark_try_string(0);
Or: det=g_quark_try_string(detail?detail:0); I say 0 because if detail is 0, then g_quark_try_string succeeds. I believe this might be where the error lies. Could try it and see.
On Wednesday, March 2, 2016 8:37 PM, Lance Dillon riffraff169@yahoo.com wrote:
Also happens with me with : [riffraff@hobbes src]$ pike --version Pike v8.0 release 1 Copyright © 1994-2013 Linköping University
compiling 8.1 right now, but that is weird.
On Wednesday, March 2, 2016 8:25 PM, Chris Angelico rosuav@gmail.com wrote:
On Thu, Mar 3, 2016 at 9:14 AM, Chris Angelico rosuav@gmail.com wrote:
On Thu, Mar 3, 2016 at 9:06 AM, Lance Dillon riffraff169@yahoo.com wrote:
Guessing that since win and btn go out of scope at end of main(), maybe they get cleaned up? I usually set things that need to persist into backend past main as global variables, or at least i have in the past.
That part shouldn't be a problem. In the original code that this is cut down from, those objects were indeed persisted elsewhere; and the GTK signal retains references anyway.
Odd discovery: Now that I'm on my laptop, I can't trigger the segfault. That might mean the corruption's still there but just doesn't happen to crash anything, or it might mean there's a real and significant difference. Both systems are using fairly recent builds of Pike 8.1, and updating the laptop to the very latest didn't start the segfaults. Strange. But it means I can't do the quick and obvious verification you mention until I get home.
ChrisA
Back on my main system, and the tiny tweak of hoisting the declarations to global doesn't change the crash. And compiling the latest Pike 8.1 doesn't change anything, either.
ChrisA
Wait!!!! I read it wrong.
Sorry, that was all wrong. That is callback_args, which is a required parameter.
get_all_args("signal_connect",args,"%s%*%*.%s%d",&a,&tmp1,&tmp2,&detail,&connect_before);
assign_svalue_no_free(&b->cb,tmp1); assign_svalue_no_free(&b->args,tmp2);
So when it tries to free the signal, it tries to free b->args, which is invalid when no callback_arg passed in.
So probably always needs a callback arg, even if only 0. In fact, if empty, should probably just default to 0, just to make it easier on code later.
On Wednesday, March 2, 2016 8:46 PM, Lance Dillon riffraff169@yahoo.com wrote:
BTW, that is in gobject.pre.
On Wednesday, March 2, 2016 8:45 PM, Lance Dillon riffraff169@yahoo.com wrote:
However:
int main() { GTK2.setup_gtk(); object btn=GTK2.Button("Raise an exception"); object win=GTK2.Window(0)->add(btn)->show_all(); win->signal_connect("destroy",lambda() {exit(0);},0); btn->signal_connect("clicked",lambda() {error("Baboom!\n");}); return -1; }
Adding another parameter of 0 to signal_connect() fixes it. int signal_connect(string signal, function callback, mixed|void callback_arg, string|void detail,int|void connect_before) {... get_all_args("signal_connect",args,"%s%*%*.%s%d",&a,&tmp1,&tmp2,&detail,&connect_before);
Matches detail: ... if (detail) det=g_quark_try_string(detail); else det=0; id=g_signal_connect_closure_by_id(G_OBJECT(THIS->obj),b->signal_id,det, gc,!connect_before); pgtk2_pop_n_elems(args);
So maybe something with detail not existing, so det gets an invalid quark at 0? Perhaps should change to: if (detail) det=g_quark_try_string(detail);else det=g_quark_try_string(0);
Or: det=g_quark_try_string(detail?detail:0); I say 0 because if detail is 0, then g_quark_try_string succeeds. I believe this might be where the error lies. Could try it and see.
On Wednesday, March 2, 2016 8:37 PM, Lance Dillon riffraff169@yahoo.com wrote:
Also happens with me with : [riffraff@hobbes src]$ pike --version Pike v8.0 release 1 Copyright © 1994-2013 Linköping University
compiling 8.1 right now, but that is weird.
On Wednesday, March 2, 2016 8:25 PM, Chris Angelico rosuav@gmail.com wrote:
On Thu, Mar 3, 2016 at 9:14 AM, Chris Angelico rosuav@gmail.com wrote:
On Thu, Mar 3, 2016 at 9:06 AM, Lance Dillon riffraff169@yahoo.com wrote:
Guessing that since win and btn go out of scope at end of main(), maybe they get cleaned up? I usually set things that need to persist into backend past main as global variables, or at least i have in the past.
That part shouldn't be a problem. In the original code that this is cut down from, those objects were indeed persisted elsewhere; and the GTK signal retains references anyway.
Odd discovery: Now that I'm on my laptop, I can't trigger the segfault. That might mean the corruption's still there but just doesn't happen to crash anything, or it might mean there's a real and significant difference. Both systems are using fairly recent builds of Pike 8.1, and updating the laptop to the very latest didn't start the segfaults. Strange. But it means I can't do the quick and obvious verification you mention until I get home.
ChrisA
Back on my main system, and the tiny tweak of hoisting the declarations to global doesn't change the crash. And compiling the latest Pike 8.1 doesn't change anything, either.
ChrisA
On Thu, Mar 3, 2016 at 12:50 PM, Lance Dillon riffraff169@yahoo.com wrote:
Sorry, that was all wrong. That is callback_args, which is a required parameter.
get_all_args("signal_connect",args,"%s%*%*.%s%d",&a,&tmp1,&tmp2,&detail,&connect_before);
assign_svalue_no_free(&b->cb,tmp1); assign_svalue_no_free(&b->args,tmp2);
So when it tries to free the signal, it tries to free b->args, which is invalid when no callback_arg passed in.
So probably always needs a callback arg, even if only 0. In fact, if empty, should probably just default to 0, just to make it easier on code later.
That's what already happens, actually - see just above the get_all_args:
if (args==2) { push_int(0); args++; }
If you provide only the signal and the function, it quietly pushes a zero onto the stack and pretends you included that.
ChrisA
Ah well, good, I must have missed that the first time (apparently).
Anyway, I'll put in some debugs (printfs, right) and do some tracing tomorrow and see what I can narrow down, unless someone else manages to find it before then.
On Wednesday, March 2, 2016 8:53 PM, Chris Angelico rosuav@gmail.com wrote:
On Thu, Mar 3, 2016 at 12:50 PM, Lance Dillon riffraff169@yahoo.com wrote:
Sorry, that was all wrong. That is callback_args, which is a required parameter.
get_all_args("signal_connect",args,"%s%*%*.%s%d",&a,&tmp1,&tmp2,&detail,&connect_before);
assign_svalue_no_free(&b->cb,tmp1); assign_svalue_no_free(&b->args,tmp2);
So when it tries to free the signal, it tries to free b->args, which is invalid when no callback_arg passed in.
So probably always needs a callback arg, even if only 0. In fact, if empty, should probably just default to 0, just to make it easier on code later.
That's what already happens, actually - see just above the get_all_args:
if (args==2) { push_int(0); args++; }
If you provide only the signal and the function, it quietly pushes a zero onto the stack and pretends you included that.
ChrisA
Weird, if I don't click the button it works, but if I do click the button, I get the segfault...
On Wednesday, March 2, 2016 8:50 PM, Lance Dillon riffraff169@yahoo.com wrote:
Wait!!!! I read it wrong.
Sorry, that was all wrong. That is callback_args, which is a required parameter.
get_all_args("signal_connect",args,"%s%*%*.%s%d",&a,&tmp1,&tmp2,&detail,&connect_before);
assign_svalue_no_free(&b->cb,tmp1); assign_svalue_no_free(&b->args,tmp2);
So when it tries to free the signal, it tries to free b->args, which is invalid when no callback_arg passed in.
So probably always needs a callback arg, even if only 0. In fact, if empty, should probably just default to 0, just to make it easier on code later.
On Wednesday, March 2, 2016 8:46 PM, Lance Dillon riffraff169@yahoo.com wrote:
BTW, that is in gobject.pre.
On Wednesday, March 2, 2016 8:45 PM, Lance Dillon riffraff169@yahoo.com wrote:
However:
int main() { GTK2.setup_gtk(); object btn=GTK2.Button("Raise an exception"); object win=GTK2.Window(0)->add(btn)->show_all(); win->signal_connect("destroy",lambda() {exit(0);},0); btn->signal_connect("clicked",lambda() {error("Baboom!\n");}); return -1; }
Adding another parameter of 0 to signal_connect() fixes it. int signal_connect(string signal, function callback, mixed|void callback_arg, string|void detail,int|void connect_before) {... get_all_args("signal_connect",args,"%s%*%*.%s%d",&a,&tmp1,&tmp2,&detail,&connect_before);
Matches detail: ... if (detail) det=g_quark_try_string(detail); else det=0; id=g_signal_connect_closure_by_id(G_OBJECT(THIS->obj),b->signal_id,det, gc,!connect_before); pgtk2_pop_n_elems(args);
So maybe something with detail not existing, so det gets an invalid quark at 0? Perhaps should change to: if (detail) det=g_quark_try_string(detail);else det=g_quark_try_string(0);
Or: det=g_quark_try_string(detail?detail:0); I say 0 because if detail is 0, then g_quark_try_string succeeds. I believe this might be where the error lies. Could try it and see.
On Wednesday, March 2, 2016 8:37 PM, Lance Dillon riffraff169@yahoo.com wrote:
Also happens with me with : [riffraff@hobbes src]$ pike --version Pike v8.0 release 1 Copyright © 1994-2013 Linköping University
compiling 8.1 right now, but that is weird.
On Wednesday, March 2, 2016 8:25 PM, Chris Angelico rosuav@gmail.com wrote:
On Thu, Mar 3, 2016 at 9:14 AM, Chris Angelico rosuav@gmail.com wrote:
On Thu, Mar 3, 2016 at 9:06 AM, Lance Dillon riffraff169@yahoo.com wrote:
Guessing that since win and btn go out of scope at end of main(), maybe they get cleaned up? I usually set things that need to persist into backend past main as global variables, or at least i have in the past.
That part shouldn't be a problem. In the original code that this is cut down from, those objects were indeed persisted elsewhere; and the GTK signal retains references anyway.
Odd discovery: Now that I'm on my laptop, I can't trigger the segfault. That might mean the corruption's still there but just doesn't happen to crash anything, or it might mean there's a real and significant difference. Both systems are using fairly recent builds of Pike 8.1, and updating the laptop to the very latest didn't start the segfaults. Strange. But it means I can't do the quick and obvious verification you mention until I get home.
ChrisA
Back on my main system, and the tiny tweak of hoisting the declarations to global doesn't change the crash. And compiling the latest Pike 8.1 doesn't change anything, either.
ChrisA
I tried to add the extra parameter to both signal_connect's, but that didn't do it. I'll try to troubleshoot more in the morning, I can't really work on it anymore tonight.
On Wednesday, March 2, 2016 8:53 PM, Lance Dillon riffraff169@yahoo.com wrote:
Weird, if I don't click the button it works, but if I do click the button, I get the segfault...
On Wednesday, March 2, 2016 8:50 PM, Lance Dillon riffraff169@yahoo.com wrote:
Wait!!!! I read it wrong.
Sorry, that was all wrong. That is callback_args, which is a required parameter.
get_all_args("signal_connect",args,"%s%*%*.%s%d",&a,&tmp1,&tmp2,&detail,&connect_before);
assign_svalue_no_free(&b->cb,tmp1); assign_svalue_no_free(&b->args,tmp2);
So when it tries to free the signal, it tries to free b->args, which is invalid when no callback_arg passed in.
So probably always needs a callback arg, even if only 0. In fact, if empty, should probably just default to 0, just to make it easier on code later.
On Wednesday, March 2, 2016 8:46 PM, Lance Dillon riffraff169@yahoo.com wrote:
BTW, that is in gobject.pre.
On Wednesday, March 2, 2016 8:45 PM, Lance Dillon riffraff169@yahoo.com wrote:
However:
int main() { GTK2.setup_gtk(); object btn=GTK2.Button("Raise an exception"); object win=GTK2.Window(0)->add(btn)->show_all(); win->signal_connect("destroy",lambda() {exit(0);},0); btn->signal_connect("clicked",lambda() {error("Baboom!\n");}); return -1; }
Adding another parameter of 0 to signal_connect() fixes it. int signal_connect(string signal, function callback, mixed|void callback_arg, string|void detail,int|void connect_before) {... get_all_args("signal_connect",args,"%s%*%*.%s%d",&a,&tmp1,&tmp2,&detail,&connect_before);
Matches detail: ... if (detail) det=g_quark_try_string(detail); else det=0; id=g_signal_connect_closure_by_id(G_OBJECT(THIS->obj),b->signal_id,det, gc,!connect_before); pgtk2_pop_n_elems(args);
So maybe something with detail not existing, so det gets an invalid quark at 0? Perhaps should change to: if (detail) det=g_quark_try_string(detail);else det=g_quark_try_string(0);
Or: det=g_quark_try_string(detail?detail:0); I say 0 because if detail is 0, then g_quark_try_string succeeds. I believe this might be where the error lies. Could try it and see.
On Wednesday, March 2, 2016 8:37 PM, Lance Dillon riffraff169@yahoo.com wrote:
Also happens with me with : [riffraff@hobbes src]$ pike --version Pike v8.0 release 1 Copyright © 1994-2013 Linköping University
compiling 8.1 right now, but that is weird.
On Wednesday, March 2, 2016 8:25 PM, Chris Angelico rosuav@gmail.com wrote:
On Thu, Mar 3, 2016 at 9:14 AM, Chris Angelico rosuav@gmail.com wrote:
On Thu, Mar 3, 2016 at 9:06 AM, Lance Dillon riffraff169@yahoo.com wrote:
Guessing that since win and btn go out of scope at end of main(), maybe they get cleaned up? I usually set things that need to persist into backend past main as global variables, or at least i have in the past.
That part shouldn't be a problem. In the original code that this is cut down from, those objects were indeed persisted elsewhere; and the GTK signal retains references anyway.
Odd discovery: Now that I'm on my laptop, I can't trigger the segfault. That might mean the corruption's still there but just doesn't happen to crash anything, or it might mean there's a real and significant difference. Both systems are using fairly recent builds of Pike 8.1, and updating the laptop to the very latest didn't start the segfaults. Strange. But it means I can't do the quick and obvious verification you mention until I get home.
ChrisA
Back on my main system, and the tiny tweak of hoisting the declarations to global doesn't change the crash. And compiling the latest Pike 8.1 doesn't change anything, either.
ChrisA
On Thu, Mar 3, 2016 at 12:45 PM, Lance Dillon riffraff169@yahoo.com wrote:
However:
int main() { GTK2.setup_gtk(); object btn=GTK2.Button("Raise an exception"); object win=GTK2.Window(0)->add(btn)->show_all(); win->signal_connect("destroy",lambda() {exit(0);},0); btn->signal_connect("clicked",lambda() {error("Baboom!\n");}); return -1; }
Adding another parameter of 0 to signal_connect() fixes it.
Not on my system, so I suspect this is another example of pushing the bug around until it stops actually segfaulting, but not actually curing it. On mine, this one still segfaults.
ChrisA
Try running that test with valgrind, it should tell you more reliably what the issue is. You will either have to compile pike --with-valgrind or use --smc-check=all so that valgrind is able to deal with the generated machine code. Another useful valgrind option is --track-origin=yes, which will try to tell you e.g. where undefined values came from and more info about the origin of things on the heap.
arne
On 03/02/16 21:48, Chris Angelico wrote:
int main() { GTK2.setup_gtk(); object btn=GTK2.Button("Raise an exception"); object win=GTK2.Window(0)->add(btn)->show_all(); win->signal_connect("destroy",lambda() {exit(0);}); btn->signal_connect("clicked",lambda() {error("Baboom!\n");}); return -1; }
Run this, click the button, then close the window. Pike will segfault.
Running a --with-debug Pike under gdb shows that the actual segfault occurs in backend_callback (defined in GTK2/source/global.pre), where it calls into the low end main loop for an iteration. Not immediately helpful.
I suspect that something's getting corrupted - maybe in the Pike stack, maybe something gets freed in GTK that should have another reference - when the exception happens. How can I go about tracking this down? Where would I put probes to watch for unexpected changes?
ChrisA
On Thu, Mar 3, 2016 at 7:13 PM, Arne Goedeke el@laramies.com wrote:
Try running that test with valgrind, it should tell you more reliably what the issue is. You will either have to compile pike --with-valgrind or use --smc-check=all so that valgrind is able to deal with the generated machine code. Another useful valgrind option is --track-origin=yes, which will try to tell you e.g. where undefined values came from and more info about the origin of things on the heap.
Hmm. Even without triggering the exception, valgrind produces a boatload of response data, finishing with:
==15612== ERROR SUMMARY: 775 errors from 287 contexts (suppressed: 0 from 0)
But after triggering the exception, this keeps coming up:
==15647== Conditional jump or move depends on uninitialised value(s) ==15647== at 0x7D0B0F9: ??? ==15647== by 0x6234048: ??? ==15647== by 0x75B5FBF: ??? ==15647== by 0x6760FFF: ??? ==15647== by 0x65DA11F: ??? ==15647== by 0x6233EC7: ??? ==15647== by 0x6335B2F: ??? ==15647== by 0x7D0B104: ??? ==15647== by 0x41D526: eval_instruction (interpret.c:1685) ==15647== by 0x41D526: catching_eval_instruction (interpret.c:2722) ==15647== by 0x41F06F: inter_return_opcode_F_CATCH (interpret.c:1269) ==15647== by 0x7D09227: ??? ==15647== by 0xFFF000037: ???
The second number, 0x6234048, keeps increasing - this error *spins*. Very interesting. Will investigate further. Thanks Arne.
ChrisA
Try compiling --without-machine-code and --with-valgrind, this will get rid of most of the false positives.
arne
On 03/03/16 10:50, Chris Angelico wrote:
On Thu, Mar 3, 2016 at 7:13 PM, Arne Goedeke el@laramies.com wrote:
Try running that test with valgrind, it should tell you more reliably what the issue is. You will either have to compile pike --with-valgrind or use --smc-check=all so that valgrind is able to deal with the generated machine code. Another useful valgrind option is --track-origin=yes, which will try to tell you e.g. where undefined values came from and more info about the origin of things on the heap.
Hmm. Even without triggering the exception, valgrind produces a boatload of response data, finishing with:
==15612== ERROR SUMMARY: 775 errors from 287 contexts (suppressed: 0 from 0)
But after triggering the exception, this keeps coming up:
==15647== Conditional jump or move depends on uninitialised value(s) ==15647== at 0x7D0B0F9: ??? ==15647== by 0x6234048: ??? ==15647== by 0x75B5FBF: ??? ==15647== by 0x6760FFF: ??? ==15647== by 0x65DA11F: ??? ==15647== by 0x6233EC7: ??? ==15647== by 0x6335B2F: ??? ==15647== by 0x7D0B104: ??? ==15647== by 0x41D526: eval_instruction (interpret.c:1685) ==15647== by 0x41D526: catching_eval_instruction (interpret.c:2722) ==15647== by 0x41F06F: inter_return_opcode_F_CATCH (interpret.c:1269) ==15647== by 0x7D09227: ??? ==15647== by 0xFFF000037: ???
The second number, 0x6234048, keeps increasing - this error *spins*. Very interesting. Will investigate further. Thanks Arne.
ChrisA
On Thu, Mar 3, 2016 at 8:53 PM, Arne Goedeke el@laramies.com wrote:
Try compiling --without-machine-code and --with-valgrind, this will get rid of most of the false positives.
Thanks - that gets rid of most of the noise.
This seems to be what's happening:
$ sudo apt-get install libglib2.0-0-dbg libgtk2.0-0-dbg $ make CONFIGUREARGS='--with-valgrind --with-debug --without-machine-code' $ bin/pike --valgrind=--track-origins=yes ../signalcrash
Baboom! /home/rosuav/signalcrash.pike:8: /home/rosuav/signalcrash()->__lambda_65649_1_line_8() -:1: Pike.Backend(0)->`()(3600.0) ==25116== Invalid read of size 8 ==25116== at 0xC01A7F0: signal_emit_unlocked_R (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.4600.2) ==25116== by 0xC023D2B: g_signal_emit_valist (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.4600.2) ==25116== by 0xC02405E: g_signal_emit (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.4600.2) ==25116== by 0xC00D473: g_object_dispatch_properties_changed (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.4600.2) ==25116== by 0xC00F900: g_object_notify (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.4600.2) ==25116== by 0xA803371: gtk_widget_send_focus_change (gtkwidget.c:11445) ==25116== by 0xA8035F2: do_focus_change (gtkwindow.c:5304) ==25116== by 0xA80DA48: _gtk_window_set_has_toplevel_focus (gtkwindow.c:8474) ==25116== by 0xA80DC5E: gtk_window_focus_out_event (gtkwindow.c:5336) ==25116== by 0xA6DF9BB: _gtk_marshal_BOOLEAN__BOXED (gtkmarshalers.c:86) ==25116== by 0xC008F44: g_closure_invoke (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.4600.2) ==25116== by 0xC01B53D: signal_emit_unlocked_R (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.4600.2) ==25116== Address 0x59 is not stack'd, malloc'd or (recently) free'd
An address of 0x59 looks like an error of some sort. It seems to be fairly consistently using that value, which corresponds to 89 decimal or 'Y' in ASCII.
Unfortunately, valgrind is only able to find addresses inside GTK, so I guess the next step would be to somehow trace this number 0x59?
ChrisA
Usually very low addresses are due to some struct pointer being NULL, so that accessing a member will result in (almost) null dereference. So maybe pike sets something to NULL when cleaning up and GTK still tries to access that later. I have not looked at anything here, just a wild guess.
arne
On 03/03/16 11:27, Chris Angelico wrote:
On Thu, Mar 3, 2016 at 8:53 PM, Arne Goedeke el@laramies.com wrote:
Try compiling --without-machine-code and --with-valgrind, this will get rid of most of the false positives.
Thanks - that gets rid of most of the noise.
This seems to be what's happening:
$ sudo apt-get install libglib2.0-0-dbg libgtk2.0-0-dbg $ make CONFIGUREARGS='--with-valgrind --with-debug --without-machine-code' $ bin/pike --valgrind=--track-origins=yes ../signalcrash
Baboom! /home/rosuav/signalcrash.pike:8: /home/rosuav/signalcrash()->__lambda_65649_1_line_8() -:1: Pike.Backend(0)->`()(3600.0) ==25116== Invalid read of size 8 ==25116== at 0xC01A7F0: signal_emit_unlocked_R (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.4600.2) ==25116== by 0xC023D2B: g_signal_emit_valist (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.4600.2) ==25116== by 0xC02405E: g_signal_emit (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.4600.2) ==25116== by 0xC00D473: g_object_dispatch_properties_changed (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.4600.2) ==25116== by 0xC00F900: g_object_notify (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.4600.2) ==25116== by 0xA803371: gtk_widget_send_focus_change (gtkwidget.c:11445) ==25116== by 0xA8035F2: do_focus_change (gtkwindow.c:5304) ==25116== by 0xA80DA48: _gtk_window_set_has_toplevel_focus (gtkwindow.c:8474) ==25116== by 0xA80DC5E: gtk_window_focus_out_event (gtkwindow.c:5336) ==25116== by 0xA6DF9BB: _gtk_marshal_BOOLEAN__BOXED (gtkmarshalers.c:86) ==25116== by 0xC008F44: g_closure_invoke (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.4600.2) ==25116== by 0xC01B53D: signal_emit_unlocked_R (in /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0.4600.2) ==25116== Address 0x59 is not stack'd, malloc'd or (recently) free'd
An address of 0x59 looks like an error of some sort. It seems to be fairly consistently using that value, which corresponds to 89 decimal or 'Y' in ASCII.
Unfortunately, valgrind is only able to find addresses inside GTK, so I guess the next step would be to somehow trace this number 0x59?
ChrisA
Try compileing Pike with -fsanitize=address, there's a fairly good chance that would catch exactly when things start going wrong of we are talking about any kind of over/under-flow.
pike-devel@lists.lysator.liu.se