I've started to read up on ECC again (and I've also spent some time reading up on ARM architecture, but that's for a different message).
As I wrote earlier, for a start I'll limit the support to standard NIST curves. So first question is naming; what's the most established names for these curves? It seems gnutls uses, e.g., "SECP256R1", I guess we can just follow that (rather than, e.g, "P-256", as it's called in the DSA spec fips-186-3)?
My current plan is as follows:
1. The top-level functions for ECDSA and the like will use mpz_t to represent point coordinates, exponents, etc, just like the current DSA code.
2. Internally, I think I want to use GMP's lower-level mpn interface, for two reasons: (i) that should make it possible, with some effort, to use ecc functions without any allocation inside Nettle. (ii) For performance, I'd like to do specific mod p functions for the various curves, and this is done best on top of functions such as mpn_add_n and mpn_addmul_1. After all, the primes p used for the curves are carefully chosen to have special structure, to enable optimizations.
3. Representation of a curve will for now be an opaque struct. The header will just declare
struct ecc_curve; extern const struct ecc_curve nettle secp256r1;
Tables for point multiplication will be compiled into the library. They will be somewhat machine dependent, depending on the GMP limb size, so they have to be generated at build time.
4. I think I'll use Jacobian coordinates. These have a penalty for addition of arbitrary points, but we can try to use multiplication algorithms where one of the points being added always is a constant, which can be represented in normalized form. Then there's no penalty over plain homogeneous coordinates (this is the operation called "madd" in gnutls sources).
I've been intrigued by the recent (2007) breakthrough of Edwards curves, but as far as I understand, that can't easily be applied to the standard NIST curves. See http://cr.yp.to/newelliptic/newelliptic.html.
5. For point multiplication with the fixed generator, I think I'll use the L=1 special case of Pippenger's algorithm, described in Daniel Bernstein's paper http://cr.yp.to/papers/pippenger.pdf. I think this is more or less the same as the "comb" method or "Lim-Lee"-method, but predating any bogus patents on the algorithm.
(I can try to summarize the algorithm later, but it's pretty simple once one gets past the notation).
I think it's fairly easy to silence with regards to timing and cache side-channels. One nice feature is that even with pretty large tables, not all of the table index bits are data-dependent (extracted from the exponent). Then it should be practical to read all possible table entries and use masking to copy the selected one, see http://gmplib.org:8000/gmp/file/eee6b8e54765/mpn/generic/tabselect.c.
This operation is the working horse for ECDSA signing, and for the first half of ECDH.
I'm not sure how large tables it makes sense to create for each curve, but I guess somewehere between a few KByte to at most 100 Kbyte. This could be made configurable at compile time, but I'd expect almost all users to stick to the default tables.
6. For general point multiplication, I tend towards using a simple non-sliding window, with a small point-specific table computed at run time. If the table is small enough, it might make sense to normalize the entries. As far as I understand, one can't beat the plain binary algorithm by a large factor; the algorithms I'm aware of only reduce the number of adds, but not the number of dups.
This operation is the working horse for the second half of ECDH.
7. For verifying ECDSA signatures, one needs to operate on two points, computing e_1 G + e_2 Y, where G is the generator (and hence offers possibilities for precomputation), and Y is the public key.
I haven't yet looked into this in detail, but it seems clear this should be done as a single operation, with only one chain of point duplications. Preferably, it should be implemented in such a way that the tables for G (from 5 above) can be reused.
Comments? Does this make sense, or is it anything which ought to be done differently?
I have had a quick look at the gnutls code. wmNAF makes me a bit uneasy when I think about side channels; there seems to be a lot of branches depending on exponent bits.
In the point addition, it seems hard to avoid special cases when adding a point to itself or to its negation, or adding the zero point. Which will cause sidechannel leaks. Not sure what to do about that, but there are some papers on "unified" ecc operations, or maybe one can do something analogous to rsa blinding.
As far as I understand, for point multiplication in both ECDSA and ECDH, it's only the scalar that ever is secret, the points multiplied are always public. Are there other uses of point multiplication where also the point is secret?
Regards, /Niels