I've now merged the curve25519 branch to master. Seems to work fine, but I would expect that it's slower than other implementations; there are *lots* of optimizations left to do. I've done only a little benchmarking, mainly for x86/ecc-25519-modp.asm.
I've also read the ecdsa paper now (http://ed25519.cr.yp.to/ed25519-20110926.pdf), and it suggests several important optimizations, most of which apply to all uses of curve25519.
Things I'd like to do, besides optimizations:
* Switch from the plain Edwards curve to the twist used for Ed25519. Should be pretty a small change.
* Implement Ed25519 signatures.
* Make the ecdsa code work over curve25519. Not that I'd expect anyone to use ecdsa over that curve, but I think it's useful for validating the generality of the ecc interface, and maybe for benchmarking.
* Review the public interface, moving functions which depend on the type of curve out of ecc.h into ecc-internal.h.
As far as optimizations go, I think the most important ones to try are
* Use the faster ecc addition formulas specific to the twist curve.
* Try radix 51 for the mod p operations (outlined in the paper), and write assembly functions for doing squaring and multiplication in registers, without storing intermediate results to memory. This should be quite similar to the arithmetic for poly1305.
Regards, /Niels