Re: [AArch64] Optimize GHASH

6 Feb 2021


      Michael Weiser michael.weiser@gmx.de writes:
...
The arm64 branch builds and passes the testsuite on aarch64 and
aarch64_be with gcc 10.2 and clang 11.0.1 with and without the optimized
assembly routines on my pine64 boards. This is with the .arch directive
instead of modifying CFLAGS and the new configure option name
--enable-arm64-crypto.
Thanks for testing! (My own testing was done with cross-compiler and
user-level qemu).
...
Out of curiosity I've also collected some benchmark numbers for
gcm_aes256. (Is that a correct and sensible algorithm for that purpose?)
I think that's appropriate for benchmarking gcm_hash, but the "update"
numbers are the ones that reflect gcm_hash performance.
...
The speedup from using pmull seems to be around 35% for encrypt/decrypt.
Interestingly, LE is about a cycle per block faster than BE even though
it should have quite a few more rev64s to execute than BE. Could this be
masked by memory accesses, pipelining or scheduling?
For the encrypt/decrypt operations, you also run AES (in CTR mode),
which works with little-endian data.
...
How is the massive speedup in update to be interpreted and that BE here
is indeed quite a bit faster than LE? Do I understand correctly that on
update only GCM is run on unencrypted data for authentication purposes
so that this number really indicates the pure GCM pmull speedup?
That's right, the "update" numbers runs only the authentication part of
gcm, i.e., gcm_hash. Which is useful for benchmarking gcm_hash, but
probably not so relevant for real world applications, since I'd expect
it's rare to pass large amounts of "associated data" to gcm.
...
What's also curious is that the system's openssl 1.1.1i is consistenly
reported an order of magnitude faster than nettle. I guess the major
factor is that there's no optimized AES for aarch64 yet in nettle which
openssl seems to have.
That would be my guess too. And if we look at the update numbers only,
the new code appears a bit faster than openssl.
...
Just out of curiosity: I assume there's no aesni-pmull-like GCM
implementation for x86_64?
That's right. There's some assembly code, but using the same algorithm
as the C implementation, based on table lookups.
Regards,
/Niels
-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [AArch64] Optimize GHASH