Hi,
Attached is an example elliptic curve multiplication that will produce a wrong result in nettle.
It's a multiplication of these coordinates 23000000000000000000000000000000000000000000000000110011C2DD0000000000000000000 46BE3FEF75FCA4BD52CE28EC3F1483A05EE154965B05282F9029E14277409908C0EBAAD2CA5449FFA61FEC78473816BC with this scalar 23000000000000C1DD3FF800E83E2CACA1010A21
The example code will do the calculation with both openssl and nettle and will produce different results (I have verified the result with nss, which produces the same result as openssl).
Compile with gcc nettle-nistp384-miscalc.c -lhogweed -lgmp -lcrypto
Hanno Böck hanno@hboeck.de writes:
Attached is an example elliptic curve multiplication that will produce a wrong result in nettle.
It's a multiplication of these coordinates 23000000000000000000000000000000000000000000000000110011C2DD0000000000000000000 46BE3FEF75FCA4BD52CE28EC3F1483A05EE154965B05282F9029E14277409908C0EBAAD2CA5449FFA61FEC78473816BC with this scalar 23000000000000C1DD3FF800E83E2CACA1010A21
Thanks. I'll look into this.
On which platform do you see the problem?
Regards, /Niels
On Fri, 11 Dec 2015 11:52:25 +0100 nisse@lysator.liu.se (Niels Möller) wrote:
On which platform do you see the problem?
64 bit x86. Just tested in a Debian 32 bit VM and there the problem doesn't appear, so it seems this is a 64 bit specific problem.
Hanno Böck hanno@hboeck.de writes:
It's a multiplication of these coordinates 23000000000000000000000000000000000000000000000000110011C2DD0000000000000000000 46BE3FEF75FCA4BD52CE28EC3F1483A05EE154965B05282F9029E14277409908C0EBAAD2CA5449FFA61FEC78473816BC with this scalar 23000000000000C1DD3FF800E83E2CACA1010A21
I've tracked this down to a miscomputation in the x86_64 assembly implementation of ecc_384_mod. If I add a testcase for the problematic value, the failure looks like
m->mod p failed: bit_size = 384 a = 4c9000000000000000000000000000000000000000000000004a604db486e000000000000000000000000000000000000000121025be29575adb2c8ffffffffffffffffffffffffffffffffffffffffffffffffffffffff t = fffffffffb37000004c90121025be29575b258cddb4d1404e0116e00098d59fb29853804d67f6e000000000004c8ffff (bad) ref = fffffffffb37000004c90121025be29575b258cddb4d1404e0116e00098d59fa29853803d67f6e000000000104c8fffe Aborted (core dumped) FAIL: ecc-mod
So most likely an unlikely carry which is mishandled. I'll dig further.
I don't know what tool you use to search for these problems, but if you are able to run it in additional configurations, the following x86_64 nettle configurations would be helpful (assuming 64-bit you're doing now is the compiler's default).
./configure ./configure --disable-assembler ./configure CC='gcc -m32' CXX='g++ -m32'
(If you reuse the nettle build tree, remember make distclean before you reconfigure).
Also testing on (32-bit) ARM would be helpful, to exercise the corresponding ARM assembly code.
Regards, /Niels
nisse@lysator.liu.se (Niels Möller) writes:
So most likely an unlikely carry which is mishandled. I'll dig further.
I've checked in a fix. For the curious, the reduction for the secp384r1 prime (aka nist 384) is done left-to-right (i.e., no Montgomery representation) based on the identity
B^6 = B^2 + 2^32(B-1) + 1 (mod p)
where B is my shorthand for 2^64, the bignum base, so B^6 = 2^384. The B-1 factor corresponds to a subtraction like
U5 U4 U3 U2 U1 U0 0 - U5 U4 U3 U2 U1 U0 ----------------------
applied to the most signifiacnt half of the 12-limb (768-bit) input to the reduction.
Clearly, this can never underflow. The result is folded into lower limbs (a shift of 32 bits is needed too, either before or after the subtraction).
Now, one folding eliminates only four limbs out of six needed, so we need folding twice, starting with folding the top two limbs. The old code splits the above subtraction into the two subtractions,
U5 U4 U3 U2 U1 U0 0 - U5 - U4 U3 U2 U1 U0 ------- ----------------
with the high one (on the left) is done first. But now the second one (on the right) *can* underflow. This negative carry is added to other positive carries in the folding process, but the net carry at that position can turn out to be negative, and that's what happened in Hanno's test case.
The code tried to allow for a negative carry by sign extension and stuff in the logic for the carry folding, but apparantly got that wrong, and possibly with other problems too if adding this negative carry in turn causes an underflow.
I reorganised the code to split U4 in upper and lower halves earlier, U4 = H 2^32 + L, and instead do the two subtractions
U5 H<<32 L U3 U2 U1 U0 0 - U5 H<<32 - L U3 U2 U1 U0 ---------------- ----------------
where neither subtraction can underflow. And then there's only positive carries to worry about in the rest of the folding process.
I intend to extend the test program testsuite/ecc-mod-test to be able to do test runs for many hours using random seeding.
Regards, /Niels
nettle-bugs@lists.lysator.liu.se