nisse@lysator.liu.se (Niels Möller) writes:
Dmitry Eremin-Solenikov dbaryshkov@gmail.com writes:
For benchmarking purposes provide wrappers around OpenSSL AES GCM implementation. Note, digest callback will work only for encryption due to OpenSSL internals.
And regarding the numbers, I think it is gcm_hash which is the bottleneck for gcm. Nettle's x86_64 assembly does the gf operations with one table lookup + shifting per byte. One could do something faster and more clever with pclmul https://software.intel.com/en-us/articles/intel-carry-less-multiplication-in....
I remember that in the "embedded nettle" project a few years ago, I looked at ARM neon instructions for carry-less multiplication. And it seems to be quite complicated, since it offered only carryless mul only on 8-bit values (with nice SIMD parallelism).
On my machine, Nettle's gcm_aes128 encrypt is roughly 8 cycles per byte, compared to 1.25 for aes128 in plain ecb mode. Openssl numbers 0.75 and 0.65, respectively.
Regards, /Niels