nettle-bugs February 2018

nettle-bugs@lists.lysator.liu.se

8 participants
14 discussions

Re: x86 sha_ni
by nisse＠lysator.liu.se 08 Feb '18

08 Feb '18

Jeffrey Walton <noloader(a)gmail.com> writes: > Looks good on a Celeron J3455, which is a [low-end] Goldmont machine > with the instructions: [...] > goldmont:nettle$ LD_LIBRARY_PATH=.lib:/usr/local/lib64/ > ./examples/nettle-benchmark > sha1_compress: 84.60 cycles 85 cycles is a lot less than than 136 cycles I observed in my testing. The function is 131 instructions long, so it's approximately 1.5 instructions per cycle. > sha1 update 1194.33 > openssl sha1 update 1321.71 And this is a 11% difference (compared to 8% in my benckmarks). Makes sense if the main crunching is fewer cycles, then the per block function call overhead is relatively larger. > A small suggestion may be to update Section 8 Installation > (https://www.lysator.liu.se/~nisse/nettle/nettle.html). It was not > obvious to me how to enable the hardware acceleration. There's an --enable-x86-aesni configure option which should enable the aesni code unconditionally in non-fat builds. And an --enable-arm-neon. But it seems I forgot to add a corresponding --enable-x86-sha-ni. But --enable-fat is the most common way to enable the support. I'm considering enabling it by default in the next release. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance.

2 1

[Jeffrey Walton] Fwd: x86 sha_ni
by nisse＠lysator.liu.se 08 Feb '18

08 Feb '18

Forwarded to the list. ---------- Forwarded message ---------- From: Jeffrey Walton <noloader(a)gmail.com> To: "Niels Möller" <nisse(a)lysator.liu.se> Cc: nettle-bugs(a)lists.lysator.liu.se Bcc: Date: Thu, 8 Feb 2018 16:34:43 -0500 Subject: Re: x86 sha_ni On Thu, Feb 8, 2018 at 12:18 PM, Niels Möller <nisse(a)lysator.liu.se> wrote: > nisse(a)lysator.liu.se (Niels Möller) writes: > >> Below replacement for sha1-compress.asm seems to run on roughly 2 >> cycles/byte when I benchmark it on an "AMD Ryzen 7 1700X" cpu in the gcc >> compile farm. Still sligthly slower than openssl, to squeeze out a few >> more cycles, it might help to change the sha1_compress interface to let >> it process more than one 64-byte block at a time. >> >> I hope to be able to wire it up via fat-x86_64.c reasonably soon. In the >> mean time, if anyone wants to try it out, just change the >> sha1-compress.asm symlink to point to this file. > > Enabled via fat-x86_64 now, and pushed to a branch named > x86_64-sha_ni-sha1. Looks good on a Celeron J3455, which is a [low-end] Goldmont machine with the instructions: goldmont:nettle$ autoreconf -f -i ... goldmont:nettle$ ./configure --enable-fat ... goldmont:nettle$ make && make check ... goldmont:nettle$ LD_LIBRARY_PATH=.lib:/usr/local/lib64/ ./examples/nettle-benchmark sha1_compress: 84.60 cycles salsa20_core: 282.80 cycles sha3_permute: 1542.60 cycles (64.27 / round) benchmark call overhead: 0.001604 us Algorithm mode Mbyte/s ... md2 update 6.90 md4 update 568.11 md5 update 384.08 openssl md5 update 443.76 sha1 update 1194.33 openssl sha1 update 1321.71 sha224 update 110.31 sha256 update 110.10 sha384 update 174.32 sha512 update 173.99 sha512-224 update 174.35 sha512-256 update 174.16 sha3_224 update 136.77 sha3_256 update 129.46 sha3_384 update 99.23 sha3_512 update 69.25 ripemd160 update 161.00 gosthash94 update 39.48 umac32 update 6560.05 umac64 update 3130.26 umac96 update 2457.21 umac128 update 1936.56 poly1305-aes update 914.79 ... A small suggestion may be to update Section 8 Installation (https://www.lysator.liu.se/~nisse/nettle/nettle.html). It was not obvious to me how to enable the hardware acceleration. A quick sentence on how to enable AES-NI and SHA would make it obvious for future readers. (Thanks for the offline help). Jeff -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance.

1 0

easier version checks
by Nikos Mavrogiannopoulos 08 Feb '18

08 Feb '18

What about extending the macros in version.h with a simple to use combined version number?

2 2

Performance of AESNI impl vs other crypto libraries
by Daniel P. Berrange 01 Feb '18

01 Feb '18

I wrote a crude/simple test program to compare the performance of AES-128-CBC across openssl, gcrypt, nettle and gnutls, and was surprised to find that nettle is consistently ~25% slower than the other libraries for its AESNI implementation. On my Core i7-6820HQ I get nettle: 850 MB/s gcrypt: 1172 MB/s gnutls: 1230 MB/s openssl: 1153 MB/s with versions nettle-3.3-2.fc26.x86_64 libgcrypt-1.7.8-1.fc26.x86_64 gnutls-3.5.14-1.fc26.x86_64 openssl-1.1.0f-7.fc26.x86_64 And on Xeon E5-2609 I get nettle: 325 MB/s gcrypt: 403 MB/s gnutls: 414 MB/s openssl: 414 MB/s with versions nettle-3.3-1.fc25.x86_64 libgcrypt-1.7.8-1.fc25.x86_64 gnutls-3.5.14-1.fc25.x86_64 openssl-1.0.2k-1.fc25.x86_64 Naively I would have expected them all to be pretty much equal given that they're delegating to the same hardware routines. Has anyone else done comparative benchmarks of nettle's impl against others & seen the same kind of results ? I'll attach my test program to this mail, so if I made a mistake in usage there feel free to point it out. FWIW, I also found there is some wierd interaction between nettle and glibc-2.23. If I have that glibc version and run with NETTLE_FAT_VERBOSE=1 it claims it is picking the AESNI impl, but the performance figures clearly show it is actually running the pure software impl because they're 100 MB/s instead of 325 MB/s. I upgraded to glibc 2.24 and this wierdness went away, so I've not investigated that further. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

5 20

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

nettle-bugs February 2018