Maamoun TK maamoun.tk@googlemail.com writes:
Thanks for the clarification, I just misunderstanded the division with the partial reduction in a previous reply.
Ok, so you mean a polynomial division of b_0(x) by P(x) where P(x) = X^128
- X^127 + X^126 + X^121 + 1
b_0(x)/P(x) = (b_0(x)*(p^-1 mod P(x))) mod P(x) b_0(x)/P(x) = (b_0(x)*(p')) mod P(x) P(x) = X^64 + X^63 + X^62 + X^57 P' = p^-1 mod P(x) = X^63 + X^62 + X^57 so the constant 0xC2
Hmm, I'm not following you (and it seems you are using P(x) to refer to two different polynomials). The operation is a bit similar to Montgomery redc, and it has taken me some time to get used to the initially strange mix of operations mod P(x) and mod x^k (or in the integer case, it's a mix of mod m, with m odd, and mod 2^k).
b_0(x) / x^64 (mod P(x))
can be computed in two different ways, giving the same result:
(i) Compute the inverse u(x) = (x^64){-1} mod P(x), then multiply b_0(x) u(x) (mod P(x))
We get u(x) = x^64 + x^63 + x^62 + x^57, with degree only 64. (This is thanks to the special structure of P(x); an inverse of some arbitrary element mod a 128-degree polynomial would be expected to be larger).
(ii) Add a multiple of P(x) to b_0(x) to make the the lowest 64 coefficients all cancel, then divide by x^64 by simply shifting / subtracting 64 from exponents of remaining terms. And because of the special structure of P(x), P(x) = 1 mod x^64, the right multiple is b_0(x) P(x). So
b_0(x) + P(x) b_0(x)
is a degree 191 polynomial, with the least 64 coefficients all zero. When simplified, it boils down to the same b_0(x) u(x), no additional reduction needed.
I tend to think about reductions (from either end) in terms of cancelling bits, so to me, (ii) is a familiar way to think about it.
let me show you part of the new implementation of _nettle_gcm_init_key in PPC
C --- calculate [H = H << 1 modulo polynomial] --- vupkhsb EMSB,H C extend most significant
bit to first byte vspltisb B1,1 C 0x01010101010101010101010101010101 vspltb EMSB,EMSB,0 C first byte quadword-extend vsl H,H,B1 C H = H << 1 vand EMSB,EMSB,POLY C EMSB &= 0xC2000000000000000000000000000001 vxor ZERO,ZERO,ZERO C 0x00000000000000000000000000000000 vxor H,H,EMSB C H ^= EMSB
xxmrghd VSR(POLY_L),VSR(ZERO),VSR(POLY) C
0x0000000000000000C200000000000000 xxmrghd VSR(POLY_H),VSR(POLY),VSR(ZERO) C 0xC2000000000000000000000000000000
C --- calculate [H^2 = H*H] --- xxswapd VSR(Hm),VSR(H) vpmsumd HP,H,POLY_L vxor L,HP,Hm xxmrghd VSR(M),VSR(H),VSR(HP) xxmrgld VSR(L),VSR(H),VSR(L) vpmsumd R,M,H vpmsumd F,L,H vpmsumd T,F,POLY_H xxswapd VSR(F),VSR(F) vxor R,R,T vxor R,R,F
R still yields the wrong value of H^2, did I miss something in the implementation or did it wrong?
I'm afraid I don't follow all details (and I would suggest trying to get the most basic variant working first, doing only a single block at a time). But one potential issue is the powers of x. If H(x) is the original key, then I think you need to precompute values corresponding to
x H(x) (mod P(x)) x H(x)^2 (mod P(x))
But it looks like you may be computing
(x H(x))^2 / x^128 = x^2 H(x)^2 / x^128
which is different. It's easy to get confused, but I think the precomputation needs a standard mod P(x), i.e., adding a multiple of P(x) cancelling the most significant coefficients, rather than the more efficient reduction cancelling the least significant coefficients.
If you're familiar with montgomery multiplication of integers, there it's possible to consistently use the operation a,b -> ab / 2^k everywhere, but that requires that all inputs are transformed to montgomery form, a --> 2^k a mod m. And in this case, it's desirable to scale the precomputed values appropriately, so no such transformation is needed for the inputs corresponding to message blocks.
To compute more powers of H, one could do the standard reduction only once, essentially transforming H to montgomery form,
H' = H(x) x^128 (mod P(x)),
then further powers of H can use the reduction of least significant coefficients,
(x H(x)) H' / x^128 = x H(x)^2
(x H(x)^2) H' / x^128 = x H(x)^3
etc.
Regards, /Niels