Re: GCM with ARM Neon (was: Re: [PATCH] "PowerPC64" GCM support)

11 Oct 2020


      On Sun, Oct 11, 2020 at 1:42 PM Niels Möller nisse@lysator.liu.se wrote:
...
nisse@lysator.liu.se (Niels Möller) writes:
...
So if we have the input in register A (loaded from memory with no
processing besides ensuring proper *byte* order), and precompute two
values, M representing b_1(x) x^64 + c_1(x), and L representing b_0(x)
x^64 + d_1(x)), then we get the two halves above with two vpmsumd,
vpmsumd R, M, A
  vpmsumd F, L, A
When doing more than one block at a time, I think it's easiest to
accumulate the R and F values separately.
BTW, I wonder if similar organization would make sense for Arm Neon.
Now, Neon doesn't have vpmsumd, the widest carryless multiplication
available is vmull.p8, which is an 8-bit to 15-bit multiply, 8 in
parallel...
I may be mistaken, but I believe 64-bit poly multiplies are available.
Or they are available on Aarch64 with Crypto extensions.
I'm not aware of poly multiplies on other ARM arches, like ARMv6 or
ARMv7 with NEON.
Jeff

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: GCM with ARM Neon (was: Re: [PATCH] "PowerPC64" GCM support)