Hi Niels,
On Tue, 2022-01-04 at 20:54 +0100, Niels Möller wrote:
nisse@lysator.liu.se (Niels Möller) writes:
nisse@lysator.liu.se (Niels Möller) writes:
I think it should be possible to reduce number of needed registers, and completely avoid using callee-save registers (load the values now in U4-U7 one at a time a bit closer to the place where they are needed in), and replace F3 with $1 in the FOLD and FOLDC macros.
Attaching a variant to do this. Passes tests with qemu, but I haven't benchmarked it on any real hardware.
Would you like to test and benchmark this on relevant real hardware, before I merged this version?
Code still below, and committed to the branch ppc-secp256-tweaks.
Compared to the current version in master branch, this version definitely improves the performance of the reduction code.
On POWER9, the reduction code shows 7% speed up when tested separately.
The improvement in P256 sign/verify is marginal. Here are the numbers from hogweed-benchmark on POWER9.
name size sign/ms verify/ms ecdsa 256 11.1013 3.5713 (master) ecdsa 256 11.1527 3.6011 (this patch)
Amitay.