Nikos Mavrogiannopoulos nmav@gnutls.org writes:
- Should ecc-curves.h really declare nettle_curve25519? Its' not needed for any of the documented functions, except for obscurities like doing ecdsa (not eddsa) over the curve. It could be moved to ecc-internal, or be marked as internal in some other way. Perhaps renaming to _nettle_ed25519 would be appropriate.
A symbol that is not used can only cause issues. Using however, _nettle_ed25519 would still export the symbol with the current ld script.
It is used internally, but application code doesn't need it for any of the documented functions (with the obscure exception mentioned above).
Hmm. For now, I think I'd prefer to move the declaration to ecc-internal.h. And maybe rethink how if and how we should do a unified interface for ecc operations on different types of curves.
- curve25519_mul should be changed to have a void return type (an earlier implementation failed for inputs which didn't correspond to points on the curve, but instead were points on its twist). But the current implementation, using the Montgomery ladder, doesn't care and computes a well defined result for all inputs.
No idea about this, do you think that a future re-implementation could need an error code?
Not sure. Maybe if there's a significant speedup. There's some potential when optimizing the twisted edwards point operations, but the coordiante transformation also has a cost, so I think it is unlikely to beat the montgomery ladder by a large margin.
- struct ed25519_private_key and struct ed25519_public_key include compile-time constant limb arrays. At least for the public key, this will imply an ABI break if/when we switch to a base 2^51 representations for GF(2^255 - 19). So maybe switch to dynamic allocation for struct ed25519_public_key, or both structs?
I'm not familiar with the other representations, but could there be a one-size fits all structure?
Maybe. On 64-bit machines, the base 2^51 bit trick expands each coordinate from 4 to 5 words. On second thought, that conversion could be done when the point is used, the cost should be negligible compared to the scalar multiplication.
Let me think aloud.
It's not obvious how much to cache in struct ed25519_public_key. The external public key is in compressed form: The y-coordinate and the lest significant bit of the x coordinate, 256 bits total. It must be decompressed by recovering the x coordinate. This is done by ed25519_sha512_set_public_key, and the result is stored in struct ed25519_public_key. And the function returns failure if decompression fails.
When the key is used in ed25519_sha512_verify, there are additional computations that depend on the public key only. The point is used for a scalar multiplication, where the scalar depends on the message. Then the point is expanded into extended coordinates, and a lookup table with 16 entries (2^ECC_MUL_A_EH_BITS). This is currently not cached.
I imagine it's an unusual case is to verify lots of signatures with the same key. So one could go to the other extreme and not cache anything at all, eliminate struct ed25519_public_key, and let ed25519_sha512_verify take the compressed public key a input.
In the other hand, *signing* several messages using the same key seems like a common use case, e.g., an ssh or tls server signing the keyexchange handshake for each connection/session. The work done by ed25519_sha512_set_private_key includes an expensive scalar multiplication. But this is needed only for computing the corresponding public key, which is added into the hashing of the message. So maybe one could have the following simpler ed25519 interface, representing all keys as strings of ED25519_KEY_SIZE octets:
void ed25519_sha512_public_key (uint8_t *pub, const uint8_t *priv);
void ed25519_sha512_sign (const uint8_t *priv, const uint8_t *pub, size_t length, const uint8_t *msg, uint8_t *signature);
int ed25519_sha512_verify (const uint8_t *pub, size_t length, const uint8_t *msg, const uint8_t *signature);
That the public key is an input to ed25519_sha512_sign means that it needs only a single scalar multiplication, and it's up to the caller to compute and possibly cache the needed public key. One could even allow pub to be NULL, and in that case recompute it (which makes the signing take twice the time).
Regards, /Niels