Maamoun TK maamoun.tk@googlemail.com writes:
Great speedup! Any idea why openssl is still slightly faster?
Sure, OpenSSL implementation uses a loop inside SH1 update function which eliminates the constant initialization and state loading/sotring for each block while nettle does that for every block iteration.
I see, that can make a difference if the actual compressing is fast enough.
Modifying the message words in-place will change the value used by 'sha1su0' and 'sha1su1' instructions. According to ARM® A64 Instruction Set Architecture: SHA1SU0 <Vd>.4S, <Vn>.4S, <Vm>.4S <Vd> Is the name of the SIMD&FP source and destination register . .
SHA1SU1 <Vd>.4S, <Vn>.4S <Vd> Is the name of the SIMD&FP source and destination register . .
So using TMP variable is necessary here. I can't think of any replacement, let me know how the other implementations handle this case.
I'm afraid I have no concrete suggestion, I would need to read up on the aarch64 instructions. Implementations that do only a single round at a time (e.g., the C implementation) uses a 16-word circular buffer for the message expansion state, and updates one of the words per round. If I read the latest patch correctly, you also don't keep any state besides the MSGx registers?
It would be nice to either make the TMP registers more temporary (i.e.,
no round depends on the value in these registers from previous rounds) and keep needed state only on the MSG variables. Or rename them to give a better hint on how they're used.
Done! Yield a slight performance increase btw.
Nice.
We can load all the constants (including duplicate values) from memory with one instruction. The issue is how to get the data address properly for every supported abi!
the easiest solution is to define the data in the .text section to make sure the address is near enough to be loaded with certain instruction. Do you want to do that?
Using .text would probably work, even if it's in some sense more correct to put the constants in rodata segment. But let's leave as is for now.
We have an intensive discussion about that in the GCM patch. The short story, this patch should work well for both endianness modes.
Sounds good.
I've pushed the combined patches to a branch arm64-sha1. Would you like to update the fat build setup, before merging to master?
Regards, /Niels