nisse@lysator.liu.se (Niels Möller) writes:
So most likely an unlikely carry which is mishandled. I'll dig further.
I've checked in a fix. For the curious, the reduction for the secp384r1 prime (aka nist 384) is done left-to-right (i.e., no Montgomery representation) based on the identity
B^6 = B^2 + 2^32(B-1) + 1 (mod p)
where B is my shorthand for 2^64, the bignum base, so B^6 = 2^384. The B-1 factor corresponds to a subtraction like
U5 U4 U3 U2 U1 U0 0 - U5 U4 U3 U2 U1 U0 ----------------------
applied to the most signifiacnt half of the 12-limb (768-bit) input to the reduction.
Clearly, this can never underflow. The result is folded into lower limbs (a shift of 32 bits is needed too, either before or after the subtraction).
Now, one folding eliminates only four limbs out of six needed, so we need folding twice, starting with folding the top two limbs. The old code splits the above subtraction into the two subtractions,
U5 U4 U3 U2 U1 U0 0 - U5 - U4 U3 U2 U1 U0 ------- ----------------
with the high one (on the left) is done first. But now the second one (on the right) *can* underflow. This negative carry is added to other positive carries in the folding process, but the net carry at that position can turn out to be negative, and that's what happened in Hanno's test case.
The code tried to allow for a negative carry by sign extension and stuff in the logic for the carry folding, but apparantly got that wrong, and possibly with other problems too if adding this negative carry in turn causes an underflow.
I reorganised the code to split U4 in upper and lower halves earlier, U4 = H 2^32 + L, and instead do the two subtractions
U5 H<<32 L U3 U2 U1 U0 0 - U5 H<<32 - L U3 U2 U1 U0 ---------------- ----------------
where neither subtraction can underflow. And then there's only positive carries to worry about in the rest of the folding process.
I intend to extend the test program testsuite/ecc-mod-test to be able to do test runs for many hours using random seeding.
Regards, /Niels