Jeffrey Walton noloader@gmail.com writes:
Looks good on a Celeron J3455, which is a [low-end] Goldmont machine with the instructions:
[...]
goldmont:nettle$ LD_LIBRARY_PATH=.lib:/usr/local/lib64/ ./examples/nettle-benchmark sha1_compress: 84.60 cycles
85 cycles is a lot less than than 136 cycles I observed in my testing. The function is 131 instructions long, so it's approximately 1.5 instructions per cycle.
sha1 update 1194.33 openssl sha1 update 1321.71
And this is a 11% difference (compared to 8% in my benckmarks). Makes sense if the main crunching is fewer cycles, then the per block function call overhead is relatively larger.
A small suggestion may be to update Section 8 Installation (https://www.lysator.liu.se/~nisse/nettle/nettle.html). It was not obvious to me how to enable the hardware acceleration.
There's an --enable-x86-aesni configure option which should enable the aesni code unconditionally in non-fat builds. And an --enable-arm-neon. But it seems I forgot to add a corresponding --enable-x86-sha-ni.
But --enable-fat is the most common way to enable the support. I'm considering enabling it by default in the next release.
Regards, /Niels