Here's a copy of the status update for January which I just submitted to Internetfonden.
Regards, /Niels
Nettle project funded by Internetfonden
Status update for January 2013
* Summary
Project has started. ARM-based test system is up and running. A first implementation of elliptic curve-based signatures is done.
* Activities
Work on the project was initiated January 9, installing the test hardware: A Pandaboard, based on a dual core Arm Cortex-A9 processor, now running Debian GNU/Linux. I used the instructions on http://www.eewiki.net/display/linuxonarm/PandaBoard for the installation, but after cross compiling kernel etc, I can now use native compiler and debugger for development.
I have confirmed that current versions of GNU Nettle and GNU GMP work fine on this hardware. I have been reading ARM manuals (but not yet done any ARM assembly coding), and reading up on relevant methods for elliptic curve operations.
For the elliptic curves, I have decided to aim for "side-channel silent" algorithms. This means that for each curve, instructions executed and memory access pattern must not depend on the input data. This is to avoid leaking information about secret keys to an attacker monitoring the timing of operations, as well as to an attacker running an untrusted process which can manipulate and observe cache misses.
Most of January have been spent on implementing the elliptic curve machinery needed for creating and verifying ECDSA signatures, over the standard curves secp192r1, secp224r1, secp256r1, secp384r1 and secp521r1.
The work-in-progress code as, well as various working notes, is published as a public git repository, see http://git.lysator.liu.se/nettle/se-nettle-2013.
During the month, roughly 130 working hours have been spent on the project.
* About elliptic curves
Elliptic curve operations are based on Jacobi coordinates (http://www.hyperelliptic.org/EFD/g1p/auto-shortw-jacobian-3.html).
Multiplication involving the group generator use a special case of Pippenger's algorithm (described in Daniel J. Bernstein's draft paper http://cr.yp.to/papers.html#pippenger). This algorithm has later been rediscovered and is covered by a couple of, most likely invalid, patents (also explained by Daniel J. Bernstein). With current parameter choices, we use precomputed tables of modest size, roughly 16 KByte for each supported curve, which are compiled into the library.
I have not yet decided on the algorithm to use for multiplication involving an arbitrary point. The current implementation uses a side-channel silent variant of the most basic binary multiplication; it can be improved by using window-based algorithm, similar to what GMP uses for the function mpz_powm_sec.
The implementation includes a testsuite, and it is tested both on x86_64 (a 64-bit PC platform) and on the Pandaboard (32-bit ARM platform). It is also benchmarked on both those platforms. To give some quantitative numbers, I compare performance to 2048-bit RSA, and measure number of signing and verifications operations that can be done per ms.
Intel i5, 3.4 GHz:
name size sign / ms verify / ms rsa 1024 6.3228 102.9886 rsa 2048 0.9559 29.5987 dsa 1024 11.1867 5.7500 ecdsa 192 17.9380 3.7946 ecdsa 224 10.0423 2.0285 ecdsa 256 9.3953 2.2191 ecdsa 384 3.5025 0.7830 ecdsa 521 2.1572 0.5291
ARM Cortex A9, 1 GHz:
name size sign / ms verify / ms rsa 1024 0.2627 4.5487 rsa 2048 0.0392 1.2488 dsa 1024 0.4691 0.2377 ecdsa 192 1.2888 0.2883 ecdsa 224 0.7856 0.1790 ecdsa 256 0.5900 0.1280 ecdsa 384 0.2423 0.0545 ecdsa 521 0.1327 0.0296
When looking at the verify operations, ECDSA is not competitive to RSA, and maybe never will. RSA has the advantage of using a small public exponent; in these benchmarks, I use e = 65537.
For signing operations, let us focus on ECDSA 224, which is believed to give slightly higher security than RSA 2048 (see http://www.keylength.com/en/3/). Then ECDSA signatures, with the current code, is a factor 10 faster on ARM, and a factor 20 faster on x86_64.
When comparing the benchmark numbers, one should keep in mind that performance is not everything. In some applications, the smaller size of keys and signatures for ECDSA is also of high importance.
* Remaining tasks
For elliptic curve signatures, one large remaining task is designing, implementing, and documenting the public programming interface. This is part of the integration work required, before the elliptic curve implementation can be incorporated in the Nettle library.
For optimizations, performance of the verify operations can be improved by algorithmic improvements. Performance of all operations can be further improved by platform-specific optimizations, implementing a few key functions in assembly.
It's also possible to write alternative elliptic curve primitives, without the requirement of side-channel silence. These could be used to speed up operations which don't have any secret inputs, in particular, verification of signatures. I give that a lower priority than the other potential optimizations, though.
For the other sub-project, to do ARM-specific optimizations for cryptographic primitives unrelated to elliptic curves, no work has been done during January, beyond a general study of the ARM architecture. Most important functions are the AES block cipher and the hash functions in the SHA family: SHA1, SHA256, SHA512, and possibly also the recently standardized SHA3 hash function. Also the memxor function is a candidate to be the first ARM-assembly function in Nettle.
nettle-bugs@lists.lysator.liu.se