Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
The CPU reports itself as Intel(R) Xeon(R) CPU X5670 @ 2.93GHz (the system has 24 such cpus). The output of nettle-benchmark on that machine follows.
x86-64 assembly: [nikos@koninck examples]$ ./nettle-benchmark -f 1.3e9 memxor
To get the printed cycle numbers to make sense, you have to pass the correct clock frequency to the -f option. -f 2.93e9 in your case.
However the results we see from my and your benchmark vary.
Right, we'll have to figure out why. I'm puzzled.
How do you benchmark? What is ncalls in time_function()?
time_function loops around the benchmarked function ncalls times, and reads the clock before and after the loop. Qnd then, if the elapsed time was too short, it increases ncalls and starts over.
My benchmark is simplistic, it counts speed, number of memxors in a fixed amount of time.
I guess that should be good enough. I'm not so familiar with SIGALARM, but I don't seen anything obviously wrong with it.
That's what I have seen as well. I keep the small amount of manual unrolling for the benefit of other machines and/or compilers (but I'm not sure where it really matters).
My personal preference would have been cleaner code.
Well, for the unaligned case, the unrolling is also a natural way to avoid moving values between s1 and s0, which I think is nice.
Regards, /Niels