Re: [Aarch64] Optimize SHA1 Compress

1 Jun 2021


      Maamoun TK maamoun.tk@googlemail.com writes:
...
...
Great speedup! Any idea why openssl is still slightly faster?
Sure, OpenSSL implementation uses a loop inside SH1 update function which
eliminates the constant initialization and state loading/sotring for each
block while nettle does that for every block iteration.
I see, that can make a difference if the actual compressing is fast
enough.
...
Modifying the message words in-place will change the value used by
'sha1su0' and 'sha1su1' instructions. According to ARM® A64 Instruction Set
Architecture:
SHA1SU0 <Vd>.4S, <Vn>.4S, <Vm>.4S
<Vd> Is the name of the SIMD&FP source and destination register
.
.
SHA1SU1 <Vd>.4S, <Vn>.4S
<Vd> Is the name of the SIMD&FP source and destination register
.
.
So using TMP variable is necessary here. I can't think of any replacement,
let me know how the other implementations handle this case.
I'm afraid I have no concrete suggestion, I would need to read up on the
aarch64 instructions. Implementations that do only a single round at a
time (e.g., the C implementation) uses a 16-word circular buffer for the
message expansion state, and updates one of the words per round. If I
read the latest patch correctly, you also don't keep any state besides
the MSGx registers?
...
It would be nice to either make the TMP registers more temporary (i.e.,
...
no round depends on the value in these registers from previous rounds)
and keep needed state only on the MSG variables. Or rename them to give
a better hint on how they're used.
Done! Yield a slight performance increase btw.
Nice.
...
We can load all the constants (including duplicate values) from memory with
one instruction. The issue is how to get the data address properly for
every supported abi!
...
the easiest solution is to define
the data in the .text section to make sure the address is near enough to be
loaded with certain instruction. Do you want to do that?
Using .text would probably work, even if it's in some sense more correct to put
the constants in rodata segment. But let's leave as is for now.
...
We have an intensive discussion about that in the GCM patch. The short
story, this patch should work well for both endianness modes.
Sounds good.
I've pushed the combined patches to a branch arm64-sha1. Would you like
to update the fat build setup, before merging to master?
Regards,
/Niels
-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Aarch64] Optimize SHA1 Compress