Maamoun TK maamoun.tk@googlemail.com writes:
I've added a new patch that optimizes SHA3 permute function for S390x architecture https://git.lysator.liu.se/nettle/nettle/-/merge_requests/36 More about the patch in merge request description.
Really nice speedup, and interesting that it's significantly faster than your previous version using the special sha3 instructions.
I'm sorry the existing implementations are quite hard to follow, with irregular data movements and rather unstructured comments. It must have been a bit challenging to decipher the x86_64 version. Do you have any ideas on how to improve documentation and comments?
Regards, /Niels