On Thu, 2024-03-28 at 21:04 +0100, Niels Möller wrote:
Eric Richter erichte@linux.ibm.com writes:
This set introduces an optimized powerpc64 assembly implementation for SHA256 and SHA512. This have been derived from BSD-2-Clause licensed code authored by IBM, originally released in the IBM POWER Cryptography Reference Implementation project[1], modified to work in Nettle, contributed under the GPL license.
Development of this new implementation targetted POWER 10, however supports the POWER 8 ISA and above. The following commits provide the performance data I recorded on POWER 10, though similar improvements can be found on P8/P9.
Thanks, I've had a first quick look. Nice speedup, and it looks pretty good. I wasn't aware of the vshasigma instructions.
One comment on the Nettle ppc conventions: I prefer to use register names rather than just register numbers; that helps me avoid some confusion when some instructions take v1 registers and others take vs1 registers. Preferably by configuring with ASM_FLAGS=-mregnames during development. For assemblers that don't like register names (seems to be the default), machine.m4 arranges for translation from v1 --> 1, etc.
Ah, thanks for letting me know, I am queuing up a version that fixes this.
I do have a macro though that calculates which register number contains the chunk of input data based on an index -- in other words, I use registers v16-v31 to hold the input data, the macro just adds 16 to the index to get the corresponding register. Right now it operates on raw register numbers, should I adjust this macro to be more clear that it is operating on vector registers in any way, or should I look into changing how that is done?
As an aside: I have tested this patch set on POWER 8 and POWER 10 hardware running little-endian linux distributions, however I have not yet been able to test on a big-endian distro. I can confirm however that the original source in IPCRI does compile and pass tests for both little and big endian via qemu-user, so spare human error in deriving the version for Nettle, it is expected to be functional.
There are big-endian tests in the ci pipeline (hosted on the mirror repo at https://gitlab.com/gnutls/nettle), using cross-compiling + qemu- user. And I also have a similar setup locally.
Thanks! I'm looking into replicating this locally as well for easier future testing, and I'll send a v2 with the updated registers once I confirm big-endian tests pass. Should I also open a MR to trigger the CI?
Thanks, - Eric
Regards, /Niels
-- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ nettle-bugs mailing list -- nettle-bugs@lists.lysator.liu.se To unsubscribe send an email to nettle-bugs-leave@lists.lysator.liu.se