I’m running builds of 8.0 to make sure we don’t have any major test failures, and I’ve run into a few problems so far. I’ll put them in separate emails so they are more manageable. If anyone can offer any assistance, that would be most appreciated. I can supply any info needed, up to getting you a logon to the systems in question.
First up, macOS 10.12+ hang on socktest.pike. The 10.11 and earlier do not have this problem, and I haven’t tried running an older binary on a newer OS. The call to gc() in finish() never returns, and according to LLDB:
(lldb) thread list
Process 4746 stopped
* thread #1: tid = 0xa7cc2d, 0x00007fff781f922a libsystem_kernel.dylib`mach_msg_trap + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
(lldb) thread backtrace
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007fff781f922a libsystem_kernel.dylib`mach_msg_trap + 10
frame #1: 0x00007fff781f976c libsystem_kernel.dylib`mach_msg + 60
frame #2: 0x00007fff781fd05e libsystem_kernel.dylib`clock_get_time + 85
frame #3: 0x0000000105974667 pike`mach_clock_get_time at rusage.c:633:7
frame #4: 0x00000001058e713b pike`do_gc(ignored_UNUSED=<unavailable>, explicit_call=<unavailable>) at gc.c:3507:24
frame #5: 0x00000001059ee723 pike`f_gc(args=<unavailable>) at builtin_functions.c:5126:11
frame #6: 0x00000001063607b7
frame #7: 0x00000001058a365a pike`mega_apply [inlined] eval_instruction(pc=<unavailable>) at interpret.c:1711:5
frame #8: 0x00000001058a3658 pike`mega_apply(type=<unavailable>, args=<unavailable>, arg1=<unavailable>, arg2=<unavailable>) at interpret.c:2695
frame #9: 0x000000010589c72d pike`apply_svalue(s=<unavailable>, args=<unavailable>) at interpret.c:3158:5
frame #10: 0x0000000105a31769 pike`got_fd_event(box=0x0000000105de5008, event=937461904) at file.c:368:5
frame #11: 0x00000001058c901f pike`backend_call_active_callbacks(fd_list=0x00007ffeea373ae8, me_UNUSED=<unavailable>) at backend.cmod:2349:6
frame #12: 0x00000001058c4839 pike`pdb_low_backend_once(pdb=0x00007fef37e086d0, timeout=0x00007ffeea373fa8) at backend.cmod:4137:11
frame #13: 0x00000001058c4aec pike`f_PollDeviceBackend_cq__backtick_28_29(args=1) at backend.cmod:4315:5
frame #14: 0x00000001058a1cbc pike`low_mega_apply(type=APPLY_SVALUE, args=1, arg1=<unavailable>, arg2=<unavailable>) at apply_low.h:221:2
frame #15: 0x00000001058a2753 pike`jump_opcode_F_CALL_FUNCTION_AND_POP at interpret_functions.h:2452:1
frame #16: 0x00000001061d4348
Some other information I discovered looking into this:
I tried to add some sleep() in the child process in order to examine the process with dtrace, but the sleep() never returned. Sleep seems to work fine with a pike -e ’sleep(5);’. If I disable the fork(), the test runs successfully.
The following test case demonstrates the problem. The sleep() can be exchanged for gc() and it also hangs.
int main() {
object pid;
if (mixed err = catch { pid = fork(); }) {
werror("fork() failed\n");
} else if (pid) {
int res = pid->wait();
werror("child exited.\n");
return 0;
}
werror("child\n");
sleep(2);
werror("slept\n");
return 0;
}
bin/pike test2.pike
child
I don’t quite understand why clock_get_time() would hang like that unless there was some sort of problem with the mach clock service across processes, though it wouldn’t surprise me if that were a problem. What is also interesting is that clock_gettime() is available in 10.12 and newer. According to the manpage, this is POSIX compliant and provides CLOCK_MONOTONIC, which is what is used on some other systems. There is a problem in that _POSIX_MONOTONIC_CLOCK is set to -1, which seems to contradict the man page. Not sure if it makes sense to try that instead, or re-initialize the clock service after the fork?
Bill