Amitay Isaacs amitay@ozlabs.org writes:
On POWER9, the new code gives ~20% speedup for ecc_secp256r1_redc in isolation, and ~1% speedup for ecdsa sign and verify over the earlier assembly version.
Thanks! Merged to master-updates for ci testing.
I think it should be possible to reduce number of needed registers, and completely avoid using callee-save registers (load the values now in U4-U7 one at a time a bit closer to the place where they are needed in), and replace F3 with $1 in the FOLD and FOLDC macros.
Regards, /Niels