Re: ECC plans

18 Jan 2013

      Many thanks for your comments.
Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
...
On 01/17/2013 10:00 AM, Niels Möller wrote:
...
So first question is naming; what's the most established names
for these curves?
...
Any would do. In any case both are used, so aliases from one to another
should be good. In the context of TLS the SECP names are used.
I'll stick to secp names for now then. And I think aliases are for the
documentation.
...
...

For point multiplication with the fixed generator, I think I'll use
the L=1 special case of Pippenger's algorithm, described in Daniel
Bernstein's paper http://cr.yp.to/papers/pippenger.pdf. I think this
is more or less the same as the "comb" method or "Lim-Lee"-method, but
predating any bogus patents on the algorithm.

How does this algorithm compare to others?
If I read the gnutls code right, ecc_mulmod_cache makes the same number
of ecc_projective_dbl_point calls as the non-cached version? If that is
right, any comb-like algorithm should be able to beat it, since those
algorithms reduce the number of point doublings and the number of
iterations in the outer loop. Depending on selected table size, of
course.
Do you have any ecc benchmarking code I could borrow?
...
The reason I insist on
performance is that it would be hard for gnutls to switch to the new
functions if they are significantly slower than the current ones,
I understand that. But there's going to be a some cost to make
operations side-channel silent (a lot more on what I mean by that
below).
...
...

For general point multiplication, I tend towards using a simple

...
non-sliding window, with a small point-specific table computed at run
   time. If the table is small enough, it might make sense to normalize
   the entries. As far as I understand, one can't beat the plain binary
   algorithm by a large factor; the algorithms I'm aware of only reduce
   the number of adds, but not the number of dups.
I really cannot answer with any facts since I need to check the details,
but I've not seen other efficient implementations using a non-sliding
window for that.
gmp's mpz_powm_sec uses a non-sliding window (and other alternative
algorithms to make running time depend only on input size, not on actual
values; the most sever slowdown is for division, used when converting
inputs to montgomery representation).
It seems to be 20% slower them mpz_powm for sizes comparable to ecc
parameters (I tried with limbs size 16),
~/hack/gmp/tune$ ./speed -s16 -r mpz_powm mpz_powm_sec
  overhead 0.000000003 secs, precision 10000 units of 6.25e-10 secs, CPU
  freq 1600.00 MHz
               mpz_powm  mpz_powm_sec
  16       #0.000414344        1.2084
Not all of the slowdown are from the non-sliding window.
...
Also for that part you don't need
constant time. [...] (in ECDH you also don't care unless it is fixed key ECDH).
In ECDH, any data dependent timing does leak some information about your
secret exponent. What you're saying, if I understand you correctly, is
that since you usually use each secret exponent only once, an attacker
will in most cases not be able to collect enough information to do any
damage.
Anyway, algorithm choice for this operation is not set in stone, and if
we do version which doesn't promise constant time, there's more freedom
in choosing algorithm.
...
Additions aren't that negligible. Also for that part you don't need
constant time. In ECDSA you only multiply the fixed generator with the
key (in ECDH you also don't care unless it is fixed key ECDH).
...

For verifying ECDSA signatures, one needs to operate on two points,
computing e_1 G + e_2 Y, where G is the generator (and hence offers
possibilities for precomputation), and Y is the public key.

I would put that of lower priority that signing. Servers (where
performance matters) rarely verify ECDSA keys.
Noted (although I'm not sure I agree it's less important than signing
performance).
...
...
I have had a quick look at the gnutls code. wmNAF makes me a bit uneasy
when I think about side channels; there seems to be a lot of branches
depending on exponent bits.
Why is that? Did you see any issues on the timing resistant version of
the function?
Let me first explain what I mean when I talk about a "side-channel
silent" function. That means that if we call the function with operands
which are of the same size, but otherwise different, the function should
execute exactly the same sequence of instructions in both cases, and
access memory in exactly the same pattern.
If one then assumes that the underlying machine instructions have data
independent timing (true for most current cpus), we leak no side
information from timing or cache behaviour. We may still leak
information through power analysis, if, e.g., executing a multiplication
instructions consumes different amount of energy depending on the input
bit patterns.
Maybe this is unnecessarily strict (I haven't really thought about what
the side-channel attack models are for particular cryptographic uses of
ecc), but I think this is what one should aim for. And I think it's
possible to achieve, at a moderate cost in performance.
I'm not intimately familiar with wmnaf, but it looks like the function
ecc_wMNAF has a lot of branches depending on the exponent bits. I also
suspect (but please correct me if I'm wrong) that the number of
generated digits is not determined by the exponent size only, it also
depends on the actual bits?
Next, in the main ecc_mulmod functions, the timing resistant versions
calls the madd primitive the same number of times independent of the
values of the digits. But there are some small remaining leaks (by the
above pretty high standard):
1. If wmnaf_len depends on the exponent bit pattern, then so does the
   number of iterations through the loop.
2. Branches depending on whether digit is > 0, < 0 or == 0. Those will
   have slightly different timing. Since digit == 0 is unlikely, that
   branch will be badly predicted. And it also affects instruction
   cache.
3. For the data cache, accesses to the wmnaf table are obviously data
   dependent. Also, data is written to different locations in memory (R
   or T) depending on whether or not the current digit is non-zero.
...
...
In the point addition, it seems hard to avoid special cases when adding
a point to itself or to its negation, or adding the zero point.
Do you mean the infinity points or the point (0,0,0)?
The infinity point, which is the unit element of the group. I denote it
0 below (P + 0 = P for all P).
The core operation of a typical point multiplication algorithm is
R += T[digit];
where R is where the result is accumulated, and T is a table indexed by
current digit (plain window, wmnaf, pippenger, whatever). Now there are
a couple of cases where this operation will not end up in the
common-case point-addition code:
1. R = 0. If the scalar n is in the range 0 < n < group order, this
   should happen only for the very first iterations of the loop.
2. R = - T[digit] (so R = 0 on output). For the above range of n, should
   also not happen (except as R = T[digit] = 0 initially in the loop).
3. T[digit] = 0. This is the digit == 0 case. For best average
   performance, one wants to handle this case specially and not do any
   addition at all (and much of the "exponent recoding" tricks are aimed
   at making this a more common case).
But for constant-time, one wants to do an addition just as for any
   other value. I have to double check how jacobian coordinates work,
   but I think usual formulas for ecc point addition don't work in this
   case.
3. R = T[digit]. The point addition code must check for this case, and
   have a special case branching to point doubling instead. I'm not sure
   if this can happen; it clearly depends on which point multiplication
   algorithm is used. E.g, if the tabulated values in T are i*P for 0 <= i < M, and we
   know that R = j*P for some j in the range M <= j < prime group order,
   then this case can be excluded.
And in any case, I imagine it will be extremely unlikely for a random
   scalar.
Regards,
/Niels
-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: ECC plans