The first item is a personal one. From January 7, I'll start working for Southpole Consulting AB, a small Sockholm-based consulting company mainly doing embedded (GNU/)Linux development.
Second item is more directly related to Nettle. I will get funding from Internetfonden, for "Adapting GNU Nettle for embedded systems". This will mean two things: Support for elliptic curve signatures (and possibly some other ECC-related things), and optimizations for the ARM architecture. The funding is for 420 hours of work, most of which will be spent during the spring, and the project will be carried out under the umbrella of Southpole Consulting.
A question for the list: Which variants of the ARM family are most important for Nettle applications? I'm not yet very familiar with the ARM world, but the following are some alternatives for testing and benchmarking:
* The current most high-end processor, Cortex-A15, where an affordable development system seems to be a recent "chromebook". Not sure exactly which model, but I guess it's this one: http://www.amazon.co.uk/Samsung-Chromebook-Wifi-Latest-Model/dp/B009RF0AQ8/r...
* The previous "most high-end" processor, Cortex-A9. An affordable development system is the PandaBoard. http://www.omappedia.com/wiki/PandaBoard_FAQ
* The Raspberry Pi computer, featuring an older (obsolete?) ARM1176JFZ, "ARMv6" architecture. http://www.raspberrypi.org/faqs
* The low-end Cortex-M0, competing with less than $1 microcontrollers. Also "ARMv6" architecture. An affordable development system seems to be LPCXPRESSO board. http://www.embeddedartists.com/products/lpcxpresso/lpc11U14_xpr.php
And if anybody here has some interesting ARM hardware to donate to the project, this is the right time.
Best regards, /Niels
On Fri, 14 Dec 2012, Niels Möller wrote:
The first item is a personal one. From January 7, I'll start working for Southpole Consulting AB, a small Sockholm-based consulting company mainly doing embedded (GNU/)Linux development.
Congrats! You can say hi to Benjamin from me :-)
Second item is more directly related to Nettle. I will get funding from Internetfonden, for "Adapting GNU Nettle for embedded systems". This will mean two things: Support for elliptic curve signatures (and possibly some other ECC-related things), and optimizations for the ARM architecture. The funding is for 420 hours of work, most of which will be spent during the spring, and the project will be carried out under the umbrella of Southpole Consulting.
A question for the list: Which variants of the ARM family are most important for Nettle applications? I'm not yet very familiar with the ARM world, but the following are some alternatives for testing and benchmarking:
- The current most high-end processor, Cortex-A15, where an affordable
development system seems to be a recent "chromebook". Not sure exactly which model, but I guess it's this one: http://www.amazon.co.uk/Samsung-Chromebook-Wifi-Latest-Model/dp/B009RF0AQ8/r...
- The previous "most high-end" processor, Cortex-A9. An affordable
development system is the PandaBoard. http://www.omappedia.com/wiki/PandaBoard_FAQ
- The Raspberry Pi computer, featuring an older (obsolete?) ARM1176JFZ,
"ARMv6" architecture. http://www.raspberrypi.org/faqs
- The low-end Cortex-M0, competing with less than $1 microcontrollers.
Also "ARMv6" architecture. An affordable development system seems to be LPCXPRESSO board. http://www.embeddedartists.com/products/lpcxpresso/lpc11U14_xpr.php
And if anybody here has some interesting ARM hardware to donate to the project, this is the right time.
As far as I know, for use in smartphones and similar, most current ones run ARMv7, on Cortex-A8/A9. In general, if targeting say android, your baseline ABIs will be either ARMv5TE or ARMv7 (with optional NEON support), but finding a good ARMv5 development platform might not be all that easy. I'm not too experienced myself with writing ARM assembly, but in general I'd guess your target arch simply depends on at what level the necessary instructions are introduced - I guess some basic stuff might benefit from just being written in general ARM assembly for ARMv5, while other things can benefit more from new instructions in v6 or v7.
I'm not sure how well suited the NEON instruction set is for the crypto things - if it is, it'll be one important target. Since it's an optional part in ARMv7 in general (the iOS ARMv7 baseline includes NEON, since they can limit the number of chipsets they run on, while Android ARMv7 doesn't include it - in particular, nvidia tegra2 based devices lack it), ideally one would be able to enable or disable it using a runtime check.
This is the case for 3rd party app developers at least, for device manufacturers it's enough to be able to enable/disable it at build time.
// Martin
On Fri, 14 Dec 2012, Martin Storsjö wrote:
On Fri, 14 Dec 2012, Niels Möller wrote:
Second item is more directly related to Nettle. I will get funding from Internetfonden, for "Adapting GNU Nettle for embedded systems". This will mean two things: Support for elliptic curve signatures (and possibly some other ECC-related things), and optimizations for the ARM architecture. The funding is for 420 hours of work, most of which will be spent during the spring, and the project will be carried out under the umbrella of Southpole Consulting.
A question for the list: Which variants of the ARM family are most important for Nettle applications? I'm not yet very familiar with the ARM world, but the following are some alternatives for testing and benchmarking:
- The current most high-end processor, Cortex-A15, where an affordable
development system seems to be a recent "chromebook". Not sure exactly which model, but I guess it's this one: http://www.amazon.co.uk/Samsung-Chromebook-Wifi-Latest-Model/dp/B009RF0AQ8/r...
- The previous "most high-end" processor, Cortex-A9. An affordable
development system is the PandaBoard. http://www.omappedia.com/wiki/PandaBoard_FAQ
- The Raspberry Pi computer, featuring an older (obsolete?) ARM1176JFZ,
"ARMv6" architecture. http://www.raspberrypi.org/faqs
- The low-end Cortex-M0, competing with less than $1 microcontrollers.
Also "ARMv6" architecture. An affordable development system seems to be LPCXPRESSO board. http://www.embeddedartists.com/products/lpcxpresso/lpc11U14_xpr.php
And if anybody here has some interesting ARM hardware to donate to the project, this is the right time.
As far as I know, for use in smartphones and similar, most current ones run ARMv7, on Cortex-A8/A9.
I forgot to add - one noteworthy detail is that Cortex-A9 has out-of-order execution, which the A8 and earlier lack. So for hand-scheduling of instructions, you'll want an A8 or older (a beagleboard/beaglebone is a good choice for that, especially if you want to hand-schedule NEON instructions).
// Martin
On 12/14/2012 05:16 PM, Niels Möller wrote:
The first item is a personal one. From January 7, I'll start working for Southpole Consulting AB, a small Sockholm-based consulting company mainly doing embedded (GNU/)Linux development.
Second item is more directly related to Nettle. I will get funding from Internetfonden, for "Adapting GNU Nettle for embedded systems". This will mean two things: Support for elliptic curve signatures (and possibly some other ECC-related things), and optimizations for the ARM architecture. The funding is for 420 hours of work, most of which will be spent during the spring, and the project will be carried out under the umbrella of Southpole Consulting.
Congratulations! About the ECC part, if you plan to base it on what I submitted last year, some clarifications. What I submitted was about curves mod p (I think the patch was about arbitrary curves, but had been tested only with curves that had a=-3 - the nist curves). This code has been further improved by Ilya in the last google summer of code by adding wmNAF multiplication and other optimizations in the code base. The current code is on gnutls' lib/nettle/ directory. Contrary to the previous patch the current code in gnutls is more coupled with gnutls due to the precalculations needed in wmNAF (wmNAF gave a 10% improvement in ECDH).
What is missing is support for curves over F(2^p).
A question for the list: Which variants of the ARM family are most important for Nettle applications? I'm not yet very familiar with the ARM world, but the following are some alternatives for testing and benchmarking:
The current most high-end processor, Cortex-A15, where an affordable development system seems to be a recent "chromebook". Not sure exactly which model, but I guess it's this one: http://www.amazon.co.uk/Samsung-Chromebook-Wifi-Latest-Model/dp/B009RF0AQ8/r...
The previous "most high-end" processor, Cortex-A9. An affordable development system is the PandaBoard. http://www.omappedia.com/wiki/PandaBoard_FAQ
The Raspberry Pi computer, featuring an older (obsolete?) ARM1176JFZ, "ARMv6" architecture. http://www.raspberrypi.org/faqs
I wouldn't say that ARMv6 is obsolete. It exists in many embedded devices.
regards, Nikos
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
Congratulations!
Thanks!
About the ECC part, if you plan to base it on what I submitted last year, some clarifications.
Not sure which code to reuse. I also wrote a proof-of-concept ecc implementation for Yubico last year (targeted at 8-bit and 16-bit microcontrollers), which is LGPL licensed.
What I submitted was about curves mod p (I think the patch was about arbitrary curves, but had been tested only with curves that had a=-3 - the nist curves).
For now, I think I'll do only standard mod p curves ("secp192r1", "secp224r1", "secp256r1", there are also other names for these curves, I don't know which names are the most established ones).
This code has been further improved by Ilya in the last google summer of code by adding wmNAF multiplication and other optimizations in the code base.
What's wmNAF? Optimizations I'm aware of:
* Multiplication for an arbitrary point: Use a standard window-based exponentiation algorithm. Not sure if it makes sense to aim for data-independent timing (like GMP mpz_pomw_sec).
* Multiplication for the generator point: Use the "comb" method for fixed-base exponentation (see Handbook of Applied Cryptography). Gives a large speedup for generating ECDSA signatures, at the cost of some constant tables.
* Representation for multiplication. In the code I've written I've used homogeneous cooordinates, not sure if maybe Jacobi coordinates would be more efficient? Do you know? When using compile-time constant tables, take advantage of normalization in the tabulated values (the homogeneous coordinate Z always 1).
* At least for the primes used for the 192-bit and 224-bit curve, Montgomery representation is not needed, since the structure of the primes (top 128 bits all ones) makes standard euclidean modulo very efficient. For the 256-bit curve, only the top 32-bits are all ones, so on 64-bit machines one may want to use montgomery, or some other special trick.
What is missing is support for curves over F(2^p).
For a start, I think I'll stick to what's described in RFC 6090, since then it seems very unlikely that I'll get into patent-related troubles.
I wouldn't say that ARMv6 is obsolete. It exists in many embedded devices.
I see.
Regards, /Niels
On 12/15/2012 01:20 PM, Niels Möller wrote:
[I add Ilya to the discussion in case he wants to add something, because he's more familiar with the elliptic implementation.]
Not sure which code to reuse. I also wrote a proof-of-concept ecc implementation for Yubico last year (targeted at 8-bit and 16-bit microcontrollers), which is LGPL licensed.
What I submitted was about curves mod p (I think the patch was about arbitrary curves, but had been tested only with curves that had a=-3 - the nist curves).
For now, I think I'll do only standard mod p curves ("secp192r1", "secp224r1", "secp256r1", there are also other names for these curves, I don't know which names are the most established ones).
Also secp384r1 and secp521r1 are needed for higher security levels.
This code has been further improved by Ilya in the last google summer of code by adding wmNAF multiplication and other optimizations in the code base.
What's wmNAF? Optimizations I'm aware of:
It is discussed in http://www.bmoeller.de/pdf/fastexp-icisc2002.pdf (over a prime field), but you can also check http://en.wikipedia.org/wiki/Elliptic_curve_point_multiplication#wNAF_method
It is an optimization in scalar multiplication.
- Multiplication for an arbitrary point: Use a standard window-based exponentiation algorithm. Not sure if it makes sense to aim for data-independent timing (like GMP mpz_pomw_sec).
This was what I had in the first patch. wmNAF is more efficient.
- Multiplication for the generator point: Use the "comb" method for fixed-base exponentation (see Handbook of Applied Cryptography). Gives a large speedup for generating ECDSA signatures, at the cost of some constant tables.
- Representation for multiplication. In the code I've written I've used
homogeneous cooordinates, not sure if maybe Jacobi coordinates would be more efficient? Do you know? When using compile-time constant tables, take advantage of normalization in the tabulated values (the homogeneous coordinate Z always 1).
In the affine space Z is 1 in all types of coordinates. Using jacobian coordinates adds some (minor) complexity but provides several (performance) advantages in the existing algorithms.
- At least for the primes used for the 192-bit and 224-bit curve, Montgomery representation is not needed, since the structure of the primes (top 128 bits all ones) makes standard euclidean modulo very efficient. For the 256-bit curve, only the top 32-bits are all ones, so on 64-bit machines one may want to use montgomery, or some other special trick.
Gnutls' code was based on libtomcrypt which used to montgomery transformation but we don't use it. I'm not sure about the benefits since I've not measured it.
However, I'd strongly suggest to check what we already have in gnutls because it is quite heavily optimized (is even faster than openssl's implementation), and there is no point to try to reinvent it. The current code is written closely to nettle's coding standards.
regards, Nikos
nettle-bugs@lists.lysator.liu.se