I've updated this set to use the proper conventions for register names, and also adjusted the IV macro according to the suggestions provided.
I can also confirm that I've gotten a working build environment based on the approach the GitLab CI configuration, and that the ppc64 big-endian build does indeed pass tests.
Amended original cover letter:
This set introduces an optimized powerpc64 assembly implementation for SHA256 and SHA512. This have been derived from BSD-2-Clause licensed code authored by IBM, originally released in the IBM POWER Cryptography Reference Implementation project[1], modified to work in Nettle, contributed under the GPL license.
Development of this new implementation targetted POWER 10, however supports the POWER 8 and above ISA. The following commits provide the performance data I recorded on POWER 10, though similar improvements can be found on P8/P9.
I have tested this patch set on POWER 8 and POWER 10, hardware running little-endian linux distributions, and via qemu-user for big-endian ppc64.
Eric Richter (2): powerpc64: Add optimized assembly for sha256-compress-n powerpc64: Add optimized assembly for sha512-compress-n
fat-ppc.c | 22 ++ powerpc64/fat/sha256-compress-n-2.asm | 36 +++ powerpc64/fat/sha512-compress-2.asm | 36 +++ powerpc64/p8/sha256-compress-n.asm | 323 +++++++++++++++++++++++++ powerpc64/p8/sha512-compress.asm | 327 ++++++++++++++++++++++++++ 5 files changed, 744 insertions(+) create mode 100644 powerpc64/fat/sha256-compress-n-2.asm create mode 100644 powerpc64/fat/sha512-compress-2.asm create mode 100644 powerpc64/p8/sha256-compress-n.asm create mode 100644 powerpc64/p8/sha512-compress.asm