Hi Niels,
On Mon, 2021-12-06 at 22:29 +0100, Niels Möller wrote:
nisse@lysator.liu.se (Niels Möller) writes:
I think the approach should apply to other 64-bit archs (should probably work also on x86_64, where it's sometimes tricky to avoid x86_64 instructions clobbering the carry flag when it should be preserved, but probably not so difficult in this case).
x86_64 version below. I could also trimmed register usage, so it no longer needs to save and restore any registers. On my machine, this gives a speedup of 17% for ecc_secp256r1_redc in isolation, 3% speedup for ecdsa sign and 7% speedup of ecdsa verify.
On POWER9, the new code gives ~20% speedup for ecc_secp256r1_redc in isolation, and ~1% speedup for ecdsa sign and verify over the earlier assembly version.
Amitay.