On Fri, Oct 22, 2021 at 10:45 AM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
I've added a new patch that optimizes SHA3 permute function for S390x architecture
https://git.lysator.liu.se/nettle/nettle/-/merge_requests/36
More about the patch in merge request description.
Really nice speedup, and interesting that it's significantly faster than your previous version using the special sha3 instructions.
Yes, special sha3 instruction of s390x arch doesn't fit well in the SHA3 permute function of nettle, it executes unneeded procedures that are handled by other functions in nettle that slow down the performance compared to regular vectorized optimization.
I'm sorry the existing implementations are quite hard to follow, with irregular data movements and rather unstructured comments. It must have been a bit challenging to decipher the x86_64 version. Do you have any ideas on how to improve documentation and comments?
I made some documentation and comment improvements on the implementation, the new doc illustrates the structure of main permute elements in more detail. The update has also some improvements on the usage of instruction set that yield a faster performance. Let me know if there is any improvement potential there!
regards, Mamone