Jeffrey Walton noloader@gmail.com writes:
Looks good on a Celeron J3455, which is a [low-end] Goldmont machine with the instructions:
[...]
goldmont:nettle$ LD_LIBRARY_PATH=.lib:/usr/local/lib64/ ./examples/nettle-benchmark sha1_compress: 84.60 cycles
85 cycles is a lot less than than 136 cycles I observed in my testing. The function is 131 instructions long, so it's approximately 1.5 instructions per cycle.
sha1 update 1194.33 openssl sha1 update 1321.71
And this is a 11% difference (compared to 8% in my benckmarks). Makes sense if the main crunching is fewer cycles, then the per block function call overhead is relatively larger.
A small suggestion may be to update Section 8 Installation (https://www.lysator.liu.se/~nisse/nettle/nettle.html). It was not obvious to me how to enable the hardware acceleration.
There's an --enable-x86-aesni configure option which should enable the aesni code unconditionally in non-fat builds. And an --enable-arm-neon. But it seems I forgot to add a corresponding --enable-x86-sha-ni.
But --enable-fat is the most common way to enable the support. I'm considering enabling it by default in the next release.
Regards, /Niels
On Thu, Feb 8, 2018 at 5:15 PM, Niels Möller nisse@lysator.liu.se wrote:
Jeffrey Walton noloader@gmail.com writes:
Looks good on a Celeron J3455, which is a [low-end] Goldmont machine with the instructions:
[...]
goldmont:nettle$ LD_LIBRARY_PATH=.lib:/usr/local/lib64/ ./examples/nettle-benchmark sha1_compress: 84.60 cycles
85 cycles is a lot less than than 136 cycles I observed in my testing. The function is 131 instructions long, so it's approximately 1.5 instructions per cycle.
sha1 update 1194.33 openssl sha1 update 1321.71
And this is a 11% difference (compared to 8% in my benckmarks). Makes sense if the main crunching is fewer cycles, then the per block function call overhead is relatively larger.
I think this might be explained by root access. I can put the Celeron in performance mode. Using https://github.com/weidai11/cryptopp/blob/master/TestScripts/governor.sh (based on a script by Andy Polyakov):
$ sudo ./governor.sh perf Current CPU governor scaling settings: CPU 0: powersave CPU 1: powersave CPU 2: powersave CPU 3: powersave New CPU governor scaling settings: CPU 0: performance CPU 1: performance CPU 2: performance CPU 3: performance
The benchmarks are then performed using the new governor scaling, which I believe is max freq.
A small suggestion may be to update Section 8 Installation (https://www.lysator.liu.se/~nisse/nettle/nettle.html). It was not obvious to me how to enable the hardware acceleration.
There's an --enable-x86-aesni configure option which should enable the aesni code unconditionally in non-fat builds. And an --enable-arm-neon. But it seems I forgot to add a corresponding --enable-x86-sha-ni.
But --enable-fat is the most common way to enable the support. I'm considering enabling it by default in the next release.
+1.
Jeff
nettle-bugs@lists.lysator.liu.se