Hi,
Nettle includes a function for side-channel silent modular inversion, which asymptotically is O(n^2), like binary gcd but slower by a pretty large constant factor.
For prime moduli, inversion can also be done via a^{-1} = a^{p-2} (mod p). That's asymptotically O(n^3) (if the underlying multiplies are done with the basic O(n^2) algorithm), but it may nevertheless be faster than the other method for numbers of the size used for Nettle's elliptic curves.
The code for curve25519 and curve448 has been using powering to invert for a long time. I've now spent some time writing specific powering code for the five secp curves as well. I've found fairly efficient addition chains where powering for a prime of n bits needs n-1 squarings and about a dozen multiplies. (I don't know what the *optimal* addition chains are, if you know of tools for that, let me know).
Code is on the branch optimize-ecc-invert. However, there's some risk the new code is slower on some platforms, in particular platforms with slow multiplication.
The main benchmark is
./examples/hogweed-benchmark ecdsa
To get numbers for just the changed function, one can run
./examples-ecc-benchmark
and look at the modinv column. Please compare performance between this branch and master, on the platforms that are important to you.
And to get relevant numbers, make sure to build using a recent GMP library; performance when built with mini-gmp is not so important.
I will merge these changes to master in a week or two, if no problems show up
I've benchmarked on one recent and one older x86_64 machine, and on raspberry-pi version 1 (the slowest ARM machine I had easy access to). I've seen improvements in ecdsa signing performance for all the secp curves, ranging from 5% up to 20% depending on curve and platform.
Not all inversions are rewritten. I haven't changed the modq inversion which is needed for ecdsa, and I haven't changed the code for the two supported gost curves.
Regards, /Niels
nisse@lysator.liu.se (Niels Möller) writes:
The code for curve25519 and curve448 has been using powering to invert for a long time. I've now spent some time writing specific powering code for the five secp curves as well. I've found fairly efficient addition chains where powering for a prime of n bits needs n-1 squarings and about a dozen multiplies. (I don't know what the *optimal* addition chains are, if you know of tools for that, let me know).
[...]
I will merge these changes to master in a week or two, if no problems show up
Before doing this merge, I've made some changes to the modulo p reduce functions (mod and redc, with both C and assembly implementations). They can now store the final result at a different location than the clobbered input area. Then, the ecc_mod_mul and ecc_mod_sqr functions are also changed to have separates result area, different from the (larger) scratch area. This makes the allocation puzzle when using the ecc_mod_* functions a lot simpler, resulting in reduced scratch need for lots of functions, and elimination of a few copy operations.
Merging those changes is now on the master-updates branch. When this is in, the new code on the optimize-ecc-invert branch can likely be simplified a bit before merging.
Regards, /Niels
nettle-bugs@lists.lysator.liu.se