Forwarded to the list.
---------- Forwarded message ---------- From: Jeffrey Walton noloader@gmail.com To: "Niels Möller" nisse@lysator.liu.se Cc: nettle-bugs@lists.lysator.liu.se Bcc: Date: Thu, 8 Feb 2018 16:34:43 -0500 Subject: Re: x86 sha_ni On Thu, Feb 8, 2018 at 12:18 PM, Niels Möller nisse@lysator.liu.se wrote:
nisse@lysator.liu.se (Niels Möller) writes:
Below replacement for sha1-compress.asm seems to run on roughly 2 cycles/byte when I benchmark it on an "AMD Ryzen 7 1700X" cpu in the gcc compile farm. Still sligthly slower than openssl, to squeeze out a few more cycles, it might help to change the sha1_compress interface to let it process more than one 64-byte block at a time.
I hope to be able to wire it up via fat-x86_64.c reasonably soon. In the mean time, if anyone wants to try it out, just change the sha1-compress.asm symlink to point to this file.
Enabled via fat-x86_64 now, and pushed to a branch named x86_64-sha_ni-sha1.
Looks good on a Celeron J3455, which is a [low-end] Goldmont machine with the instructions:
goldmont:nettle$ autoreconf -f -i ...
goldmont:nettle$ ./configure --enable-fat ...
goldmont:nettle$ make && make check ...
goldmont:nettle$ LD_LIBRARY_PATH=.lib:/usr/local/lib64/ ./examples/nettle-benchmark sha1_compress: 84.60 cycles salsa20_core: 282.80 cycles sha3_permute: 1542.60 cycles (64.27 / round)
benchmark call overhead: 0.001604 us Algorithm mode Mbyte/s ...
md2 update 6.90 md4 update 568.11 md5 update 384.08 openssl md5 update 443.76 sha1 update 1194.33 openssl sha1 update 1321.71 sha224 update 110.31 sha256 update 110.10 sha384 update 174.32 sha512 update 173.99 sha512-224 update 174.35 sha512-256 update 174.16 sha3_224 update 136.77 sha3_256 update 129.46 sha3_384 update 99.23 sha3_512 update 69.25 ripemd160 update 161.00 gosthash94 update 39.48 umac32 update 6560.05 umac64 update 3130.26 umac96 update 2457.21 umac128 update 1936.56 poly1305-aes update 914.79 ...
A small suggestion may be to update Section 8 Installation (https://www.lysator.liu.se/~nisse/nettle/nettle.html). It was not obvious to me how to enable the hardware acceleration. A quick sentence on how to enable AES-NI and SHA would make it obvious for future readers. (Thanks for the offline help).
Jeff