Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448

9 Jan 2018


      Daiki Ueno ueno@gnu.org writes:
...
Thank you for the suggestions and sorry for the shameless delay.
No problem, I'm also quite slow. I think there are three main pieces
left to integrate.
1. Curve operations to support Curve448 (i.e., diffie-hellman
   operations). I have made some progress, on my curve448 branch,
2. SHAKE 128/256. I think I had some question on the interface design.
3. EdDSA 448.
Optimization of the mod p arithmetic isn't that important yet, but I'll
nevertheless try to explain how I think about it.
...
Due to my ignorance, I probably don't get what you mean, but as far as I
read the implementations of other curves in Nettle, some of them seem to
use the property of generalized Mersenne numbers described in:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.2133
and it's also applicable to the Curve448 prime.
The way I think about it, we have p = 2^448 - 2^224 - 1. This implies
that
2^448 = 2^224 + 1 (mod p)
(since a = b (mod p) means that a - b is a multiple of p, and in this
case, the difference between left and right hand is exactly p).
...
So a 896-bit product here:
a0 + a1 * 2^k + a2 * 2^2k + a3 * 2^3k (k = 224)
And in this notation, 2^2k = 2^k + 1 (mod p) and 2^3k = 2^2k + 2^k (mod p). If
we use these relations to replace the high powers of two, we get
precisely your formulas:
...
can be reduced to:
b0 + b1 * 2^k (mod p)
through modular additions of the smaller numbers:
b0 = a0 + a2 + a3 (mod p)
  b1 = a1 + a2 + a3 + a3 (mod p)
I'm not sure what's the best way to organize this. I suggested doing it
in two steps, first reduce the high three chunks to two and leave a0 unchanged:
a1 +              a2 2^k + a3 2^2k 
 = (a1 + a3) + (a2 + a3) 2^k
 =  b1 +             b_2 2^k  (mod p)
where we can do the carry propagation and carry wrap-around to get b0
and b1 as 224-bit numbers.
And then repeat the same procedure once more
a0 +              b1 2^k + b2 2^2k 
 = (a0 + b2) + (b1 + b2) 2^k
 =  c0 +              c1 2^k
This is going to work out nicely on 32-bit platforms. For 64-bit, one
should get better performance the more shift operations can be avoided,
and your variant might be faster then the above.
On the other hand, for assembly implementation keeping values in
registers, the two step way needs fewer registers, since loading the
lowest limbs can be postponed until after the higher order limbs are
eliminated and no longer need any registers.
Looking at your formulas,
...
b0 = a0 + a2 + a3 (mod p)
  b1 = a1 + a2 + a3 + a3 (mod p)
I think it should be possible to implement with only a single 7-limb
shift: In the b0 formula, only a3 needs shifting, a0 and a2 are already
at the right position. And in the b1 formula, a1 and a2 are at the right
position, but a2 needs a shift. I think I'd try the following:
1. Do the additions a0 += a2, a1 += a3, which I think can be done as a
   single 7-limb add, with plain carry propagation between the pieces.
   Carry out can be folded immediately, or saved for later.
2. Do the second a1 += a3 (needs some masking to extract the a3 bits,
   but no shift). Or maybe skip the masking, and handle the low 32 bits
   of a3 some other way.
3. Do an in-place 32-bit left shift of a2, a3.
4. Add the shifted terms, a0 += a3, a1 += a2. This gets even better if
   we make the a2, a3 shift above a rotate storing the shifted out high
   bits of a3 back at the position of the low bits of a2.
5. Do final carry handling.
There's no need to try to find the optimal reduction strategy right
away, though, other missing pieces are more important.
...
I tried to implement it (as attached) and got 20-30% speed-up with
mini-gmp.
For good test coverage of these functions, I recommend building with the real gmp (to get access to
its mpz_rrandomb function), and then let
while NETTLE_TEST_SEED=0 ./ecc-mod-test ; do : ; done
run over night. And possibly also hacking ecc_mod_test.c to only test
the curve of interest.
Regards,
/Niels
-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [PATCH 0/8] Implement Curve448 ECDH and Ed448