Re: [PATCH] "PowerPC64" GCM support

11 Oct 2020


      Maamoun TK maamoun.tk@googlemail.com writes:
...
Thanks for the clarification, I just misunderstanded the division with the
partial reduction in a previous reply.
Ok, so you mean a polynomial division of  b_0(x) by P(x) where P(x) = X^128

X^127 +  X^126 +  X^121 + 1

b_0(x)/P(x) = (b_0(x)*(p^-1 mod P(x))) mod P(x)
b_0(x)/P(x) = (b_0(x)*(p')) mod P(x)
P(x) = X^64 +  X^63 +  X^62 +  X^57
P' = p^-1 mod P(x) = X^63 +  X^62 +  X^57
so the constant 0xC2
Hmm, I'm not following you (and it seems you are using P(x) to refer to
two different polynomials). The operation is a bit similar to Montgomery
redc, and it has taken me some time to get used to the initially strange
mix of operations mod P(x) and mod x^k (or in the integer case, it's a
mix of mod m, with m odd, and mod 2^k).
b_0(x) / x^64 (mod P(x))
can be computed in two different ways, giving the same result:
(i) Compute the inverse u(x) = (x^64){-1} mod P(x), then multiply
       b_0(x) u(x) (mod P(x))
We get u(x) = x^64 + x^63 + x^62 + x^57, with degree only 64.
       (This is thanks to the special structure of P(x); an inverse of
       some arbitrary element mod a 128-degree polynomial would be
       expected to be larger).
(ii) Add a multiple of P(x) to b_0(x) to make the the lowest 64
       coefficients all cancel, then divide by x^64 by simply shifting /
       subtracting 64 from exponents of remaining terms. And because of
       the special structure of P(x), P(x) = 1 mod x^64, the right
       multiple is b_0(x) P(x). So
b_0(x) + P(x) b_0(x)
is a degree 191 polynomial, with the least 64 coefficients all
       zero. When simplified, it boils down to the same b_0(x) u(x), no
       additional reduction needed.
I tend to think about reductions (from either end) in terms of
cancelling bits, so to me, (ii) is a familiar way to think about it.
...
let me show you part of the new implementation of _nettle_gcm_init_key in
PPC
C --- calculate [H = H << 1 modulo polynomial] ---

vupkhsb        EMSB,H                        C extend most significant

bit to first byte
    vspltisb       B1,1                          C
0x01010101010101010101010101010101
    vspltb         EMSB,EMSB,0                   C first byte
quadword-extend
    vsl            H,H,B1                        C H = H << 1
    vand           EMSB,EMSB,POLY                C EMSB &=
0xC2000000000000000000000000000001
    vxor           ZERO,ZERO,ZERO                C
0x00000000000000000000000000000000
    vxor           H,H,EMSB                      C H ^= EMSB
xxmrghd        VSR(POLY_L),VSR(ZERO),VSR(POLY) C

0x0000000000000000C200000000000000
    xxmrghd        VSR(POLY_H),VSR(POLY),VSR(ZERO) C
0xC2000000000000000000000000000000
C --- calculate [H^2 = H*H] ---

xxswapd        VSR(Hm),VSR(H)
vpmsumd        HP,H,POLY_L
vxor           L,HP,Hm
xxmrghd        VSR(M),VSR(H),VSR(HP)
xxmrgld        VSR(L),VSR(H),VSR(L)

vpmsumd        R,M,H
vpmsumd        F,L,H

vpmsumd        T,F,POLY_H
xxswapd        VSR(F),VSR(F)
vxor           R,R,T
vxor           R,R,F


R still yields the wrong value of H^2, did I miss something in the
implementation or did it wrong?
I'm afraid I don't follow all details (and I would suggest trying to get
the most basic variant working first, doing only a single block at a
time). But one potential issue is the powers of x. If H(x) is the
original key, then I think you need to precompute values corresponding
to
x H(x) (mod P(x))
  x H(x)^2 (mod P(x))
But it looks like you may be computing
(x H(x))^2 / x^128 =  x^2 H(x)^2 / x^128
which is different. It's easy to get confused, but I think the
precomputation needs a standard mod P(x), i.e., adding a multiple of
P(x) cancelling the most significant coefficients, rather than the more
efficient reduction cancelling the least significant coefficients.
If you're familiar with montgomery multiplication of integers, there
it's possible to consistently use the operation a,b -> ab / 2^k
everywhere, but that requires that all inputs are transformed to
montgomery form, a --> 2^k a mod m. And in this case, it's desirable to
scale the precomputed values appropriately, so no such transformation is
needed for the inputs corresponding to message blocks.
To compute more powers of H, one could do the standard reduction only
once, essentially transforming H to montgomery form,
H' = H(x) x^128 (mod P(x)),
then further powers of H can use the reduction of least significant
coefficients,
(x H(x)) H' / x^128 = x H(x)^2
(x H(x)^2) H' / x^128 = x H(x)^3
etc.
Regards,
/Niels
-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [PATCH] "PowerPC64" GCM support