Thanks for the clarification, I just misunderstanded the division with the partial reduction in a previous reply.
Ok, so you mean a polynomial division of b_0(x) by P(x) where P(x) = X^128 + X^127 + X^126 + X^121 + 1 b_0(x)/P(x) = (b_0(x)*(p^-1 mod P(x))) mod P(x) b_0(x)/P(x) = (b_0(x)*(p')) mod P(x) P(x) = X^64 + X^63 + X^62 + X^57 P' = p^-1 mod P(x) = X^63 + X^62 + X^57 so the constant 0xC2
let me show you part of the new implementation of _nettle_gcm_init_key in PPC
C --- calculate [H = H << 1 modulo polynomial] ---
vupkhsb EMSB,H C extend most significant bit to first byte vspltisb B1,1 C 0x01010101010101010101010101010101 vspltb EMSB,EMSB,0 C first byte quadword-extend vsl H,H,B1 C H = H << 1 vand EMSB,EMSB,POLY C EMSB &= 0xC2000000000000000000000000000001 vxor ZERO,ZERO,ZERO C 0x00000000000000000000000000000000 vxor H,H,EMSB C H ^= EMSB
xxmrghd VSR(POLY_L),VSR(ZERO),VSR(POLY) C 0x0000000000000000C200000000000000 xxmrghd VSR(POLY_H),VSR(POLY),VSR(ZERO) C 0xC2000000000000000000000000000000
C --- calculate [H^2 = H*H] ---
xxswapd VSR(Hm),VSR(H) vpmsumd HP,H,POLY_L vxor L,HP,Hm xxmrghd VSR(M),VSR(H),VSR(HP) xxmrgld VSR(L),VSR(H),VSR(L)
vpmsumd R,M,H vpmsumd F,L,H
vpmsumd T,F,POLY_H xxswapd VSR(F),VSR(F) vxor R,R,T vxor R,R,F
R still yields the wrong value of H^2, did I miss something in the implementation or did it wrong?
On Sun, Oct 11, 2020 at 8:14 PM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
Hi Niels,
I tried to apply your method but can't get it work,
Hmm, do you think I've missed something in the math, or are there other difficulties?
while applying it one question came to my mind.
First, compute b_0(x) / x^64 (mod P(x)), which expands it from 64 bits
to
128,
c_1(x) x^64 + c_0(x) = b_0(x) / x^64 (mod P(x))
Here you are trying to get partially reduced product by computing b_0(x)
/
x^64 (mod P(x)) but since the degree of input is 127, we can use the polynomial defining the finite field with x^64 elements, in this case
P(x)
= X^64+X^4+X^3+X+1 and P' = P^-1 (mod X^64) = X^63+X^61+X^60+1 which is
the
same constant 0xB0 and the function now: c_1(x) x^64 + c_0(x) = ((b_0 mod X^64) * p') mod X^64
For correctness, I think it is important that the computation b_0(x) / x^64 is done modulo the gcm polynomial (originally, x^128 + x^7 + x^2 + x + 1, but after bit reflection, P(x) = x^128 + x^127 + x^126 + x^125 + 1).
I don't see how one can do part of the computation in GF(2^64), or how your degree-64 polynomial relates the the original degree-128 polynomial. If there's some useful embedding of GF(2^64) as a subfield of GF(2^128), please explain?
That said, division by x^64 is fairly cheap, since P(x) = 1 (mod x^64). I think we get
b_0(x) / x^64 (mod P(x)) = b_0(x) (1 + P(x)) / x^64 (mod P(x)
where we can simplify (P(x) + 1) / x^64 to x^64 + x^63 + x^62 + x^58, or
b_0(x) / x^64 (mod P(x)) = b_0(x) (x^64 to x^64 + x^63 + x^62 + x^58)
So no reduction needed, just split the product in high and low part to get c_1 and c_0.
Regards, /Niels
-- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance.