On 12/15/2012 01:20 PM, Niels Möller wrote:
[I add Ilya to the discussion in case he wants to add something, because he's more familiar with the elliptic implementation.]
Not sure which code to reuse. I also wrote a proof-of-concept ecc implementation for Yubico last year (targeted at 8-bit and 16-bit microcontrollers), which is LGPL licensed.
What I submitted was about curves mod p (I think the patch was about arbitrary curves, but had been tested only with curves that had a=-3 - the nist curves).
For now, I think I'll do only standard mod p curves ("secp192r1", "secp224r1", "secp256r1", there are also other names for these curves, I don't know which names are the most established ones).
Also secp384r1 and secp521r1 are needed for higher security levels.
This code has been further improved by Ilya in the last google summer of code by adding wmNAF multiplication and other optimizations in the code base.
What's wmNAF? Optimizations I'm aware of:
It is discussed in http://www.bmoeller.de/pdf/fastexp-icisc2002.pdf (over a prime field), but you can also check http://en.wikipedia.org/wiki/Elliptic_curve_point_multiplication#wNAF_method
It is an optimization in scalar multiplication.
- Multiplication for an arbitrary point: Use a standard window-based exponentiation algorithm. Not sure if it makes sense to aim for data-independent timing (like GMP mpz_pomw_sec).
This was what I had in the first patch. wmNAF is more efficient.
- Multiplication for the generator point: Use the "comb" method for fixed-base exponentation (see Handbook of Applied Cryptography). Gives a large speedup for generating ECDSA signatures, at the cost of some constant tables.
- Representation for multiplication. In the code I've written I've used
homogeneous cooordinates, not sure if maybe Jacobi coordinates would be more efficient? Do you know? When using compile-time constant tables, take advantage of normalization in the tabulated values (the homogeneous coordinate Z always 1).
In the affine space Z is 1 in all types of coordinates. Using jacobian coordinates adds some (minor) complexity but provides several (performance) advantages in the existing algorithms.
- At least for the primes used for the 192-bit and 224-bit curve, Montgomery representation is not needed, since the structure of the primes (top 128 bits all ones) makes standard euclidean modulo very efficient. For the 256-bit curve, only the top 32-bits are all ones, so on 64-bit machines one may want to use montgomery, or some other special trick.
Gnutls' code was based on libtomcrypt which used to montgomery transformation but we don't use it. I'm not sure about the benefits since I've not measured it.
However, I'd strongly suggest to check what we already have in gnutls because it is quite heavily optimized (is even faster than openssl's implementation), and there is no point to try to reinvent it. The current code is written closely to nettle's coding standards.
regards, Nikos