Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
On Tue, Apr 16, 2013 at 1:08 PM, Niels Möller nisse@lysator.liu.se wrote:
And I'm not sure how much difference to performance it would really make. I guess it's not worth doing unless there's a large demonstraded gain in performance.
The results will be very CPU-specific. If you have any benchmark or test code, I could test on i7 and amd 64 cpus.
No, I don't have any good benchmark. But maybe it matters mostly for code which is close to memory bandwidth limits.
Speaking of benchmarks, I've written some more umac assembly (not yet in the public repo, I'll try to get it in later today).
x86_64 (Intel i5, 3.4 GHz):
Algorithm mode Mbyte/s sha256 update 286.04 sha512 update 433.52 umac32 update 17837.65 umac64 update 8364.80 umac96 update 6447.72 umac128 update 5270.74
ARM (Cortex-A9, 1 GHz):
Algorithm mode Mbyte/s sha256 update 31.69 sha512 update 30.38 umac32 update 937.02 umac64 update 464.81 umac96 update 383.02 umac128 update 350.13
So umac128 seems to be an order of magnitude faster than sha2. On machines with decent multiplication performance.
Regards, /Niels