Re: [PATCH] "PowerPC64" GCM support

5 Oct 2020


      Maamoun TK maamoun.tk@googlemail.com writes:
...
+L4x_loop:
[...]
...

C polynomial multiplication "classical" pre-processing
xxmrghd        VSR(C23h),VSR(C2),VSR(C3)
xxmrgld        VSR(C23l),VSR(C2),VSR(C3)
xxmrghd        VSR(C01h),VSR(C0),VSR(C1)
xxmrgld        VSR(C01l),VSR(C0),VSR(C1)

C polynomial multiplication "classical"
vpmsumd        C3,C3,H1                      C M3 = H^1l*C3h⊕H^1h*C3l
vpmsumd        C2,C2,H2                      C M2 = H^2l*C2h⊕H^2h*C2l
vpmsumd        C1,C1,H3                      C M1 = H^3l*C1h⊕H^3h*C1l
vpmsumd        C0,C0,H4                      C M0 = H^4l*C0h⊕H^4h*C0l
vpmsumd        C23h,C23h,H21h                C H23 = H^2h*C2h⊕H^1h*C3h
vpmsumd        C23l,C23l,H21l                C L23 = H^2l*C2l⊕H^1l*C3l
vpmsumd        C01h,C01h,H43h                C H01 = H^4h*C0h⊕H^4h*C1h
vpmsumd        C01l,C01l,H43l                C L01 = H^4l*C0l⊕H^4l*C1l

So you do 4 blocks with only 10 vpmsumd instructions (the 8 above, and 2
more below for the reduction). That's nice! It would be even nicer if
the H terms could be rearranged to eliminate the pre-processing of the C
values.
And it is all with polynomials bits reversed compared to the spec? I
find the spec,
https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38d.pd...,
not crystal clear, but as I understand it, the bit that is least
significant when interpreted as a plain integer corresponds to the
highest power of x, while instructions like vpmsumd use the opposite
(and more natural, imo) convention.
This is how I understand bit reversal. We want
c(x) = a(x) b(x) mod p(x) = a(x) b(x) + q(x) p(x)
where a, b and c all are 128 bits, p is 129 bits, and q is 127 bits
(chosen so the high 127 bits of the 255 bit product cancel). Reverse as
c'(x) = x^127 c(1/x), similarly for a', b',
q'(x) = x^126 q(1/x), p'(x) = x^128 p(1/x)
Then we get
c'(x) = [a'(x) b'(x) + q'(x) p'(x)] / x^127
i.e., q' now cancels the *low* 127 bits of the product. Which can also be
written as
c'(x) = a'(x) b'(x) / x^127 (mod p'(x))
So in this way, bit reversal is a bit similar to montgomery
representation for modular arithmetic on integers. And if 128 and
corresponding bit boundary is more convenient than 127, one can either
shift the product one bit left, or premultiply one of the factors with x
mod p'(x).
Regards,
/Niels
-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [PATCH] "PowerPC64" GCM support