nisse at lysator.liu.se
Mon Sep 12 19:03:03 CEST 2011
Nikos Mavrogiannopoulos <n.mavrogiannopoulos at gmail.com> writes:
> The CPU reports itself as Intel(R) Xeon(R) CPU X5670 @ 2.93GHz (the
> system has 24 such cpus). The output of nettle-benchmark on that
> machine follows.
> x86-64 assembly:
> [nikos at koninck examples]$ ./nettle-benchmark -f 1.3e9 memxor
To get the printed cycle numbers to make sense, you have to pass the
correct clock frequency to the -f option. -f 2.93e9 in your case.
> However the results we see from my and your benchmark vary.
Right, we'll have to figure out why. I'm puzzled.
> How do you benchmark? What is ncalls in time_function()?
time_function loops around the benchmarked function ncalls times, and
reads the clock before and after the loop. Qnd then, if the elapsed time
was too short, it increases ncalls and starts over.
> My benchmark is simplistic, it counts speed, number of memxors in a
> fixed amount of time.
I guess that should be good enough. I'm not so familiar with SIGALARM,
but I don't seen anything obviously wrong with it.
>> That's what I have seen as well. I keep the small amount of manual
>> unrolling for the benefit of other machines and/or compilers (but I'm
>> not sure where it really matters).
> My personal preference would have been cleaner code.
Well, for the unaligned case, the unrolling is also a natural way to
avoid moving values between s1 and s0, which I think is nice.
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the nettle-bugs