I was trying the nettle-benchmark program and found that it hangs at startup burning 100% CPU.
Debugging shows this is when measuring benchmark overhead. With a quick printf of the "ncalls" variable in time_function(), I can see that it overflows:
time_function ncalls=100 elapsed=0.000010 time_function ncalls=1000 elapsed=0.000003 time_function ncalls=10000 elapsed=0.000002 time_function ncalls=100000 elapsed=0.000002 time_function ncalls=1000000 elapsed=0.000002 time_function ncalls=10000000 elapsed=0.000002 time_function ncalls=100000000 elapsed=0.000002 time_function ncalls=1000000000 elapsed=0.000002 time_function ncalls=1410065408 elapsed=0.000002 time_function ncalls=1215752192 elapsed=0.000002 time_function ncalls=-727379968 elapsed=0.000002 time_function ncalls=1316134912 elapsed=0.000002 time_function ncalls=276447232 elapsed=0.000002 time_function ncalls=-1530494976 elapsed=0.000002 time_function ncalls=1874919424 elapsed=0.000002 time_function ncalls=1569325056 elapsed=0.000002 time_function ncalls=-1486618624 elapsed=0.000002 time_function ncalls=-1981284352 elapsed=0.000002 time_function ncalls=1661992960 elapsed=0.000002 time_function ncalls=-559939584 elapsed=0.000002 time_function ncalls=-1304428544 elapsed=0.000002 time_function ncalls=-159383552 elapsed=0.000002 time_function ncalls=-1593835520 elapsed=0.000002 time_function ncalls=1241513984 elapsed=0.000002 time_function ncalls=-469762048 elapsed=0.000002 time_function ncalls=-402653184 elapsed=0.000002 time_function ncalls=268435456 elapsed=0.000002 time_function ncalls=-1610612736 elapsed=0.000002 time_function ncalls=1073741824 elapsed=0.000002 time_function ncalls=-2147483648 elapsed=0.000002 time_function ncalls=0 elapsed=0.000002 time_function ncalls=0 elapsed=0.000002 time_function ncalls=0 elapsed=0.000002
The elapsed time is the same regardless of ncalls, so I'm thinking that the compiler as been clever and optimized bench_nothing() into literally nothing. If I modify it to
static void bench_nothing(void *arg UNUSED) { static int i = 0; i++; return; }
then things work, but of course we're not benchmarking "nothing" anymore.
This is on Fedora 32 with gcc-10.1.1-1.fc32.x86_64
Regards, Daniel
Daniel P. Berrangé berrange@redhat.com writes:
The elapsed time is the same regardless of ncalls, so I'm thinking that the compiler as been clever and optimized bench_nothing() into literally nothing. If I modify it to
static void bench_nothing(void *arg UNUSED) { static int i = 0; i++; return; }
then things work, but of course we're not benchmarking "nothing" anymore.
Maybe simplest to just delete this part of the benchmark? I don't think it's that useful.
Regards, /Niels
On Fri, May 29, 2020 at 02:59:26PM +0200, Niels Möller wrote:
Daniel P. Berrangé berrange@redhat.com writes:
The elapsed time is the same regardless of ncalls, so I'm thinking that the compiler as been clever and optimized bench_nothing() into literally nothing. If I modify it to
static void bench_nothing(void *arg UNUSED) { static int i = 0; i++; return; }
then things work, but of course we're not benchmarking "nothing" anymore.
Maybe simplest to just delete this part of the benchmark? I don't think it's that useful.
After more debugging I found this is due to GCC 10 introdicing the -finline-functions arg at -O2.
Using -fno-inline-functions fixes the problem.
Alternatively adding __attribute__((noinline)) to "time_function" fixes it - nb noinline on "bench_nothing" does NOT fix it.
Alternatively adding assert(ncalls != 0); in the loop in time_function fixes it, because causing GCC to stop inlining it. That's largely luck, but it probably makes sense to have that assert() added regardless as this loop is inherantly susceptible to this wraparound problem as written.
Regards, Daniel
Daniel P. Berrangé berrange@redhat.com writes:
Alternatively adding __attribute__((noinline)) to "time_function" fixes it - nb noinline on "bench_nothing" does NOT fix it.
I've now deleted the problematic code.
I would guess the reason the compiler thinks it can optimize away the entire loop in time_function is that after inlining, it finds that the loop body, f(arg), has no side effects. It will hopefully not do the same with any other functions in the benchmark.
Alternatively adding assert(ncalls != 0); in the loop in time_function fixes it, because causing GCC to stop inlining it. That's largely luck, but it probably makes sense to have that assert() added regardless as this loop is inherantly susceptible to this wraparound problem as written.
It would be more robust to check if ncalls is about to overflow, and in that case let time_function return 0.0. I'll consider that if any similar problems reappear.
Regards, /Niels
nettle-bugs@lists.lysator.liu.se