---------- Forwarded message ---------
From: Maamoun TK <maamoun.tk(a)googlemail.com>
Date: Thu, Nov 12, 2020 at 7:42 PM
Subject: Re: [PowerPC] GCM optimization
To: Niels Möller <nisse(a)lysator.liu.se>
On Thu, Nov 12, 2020 at 6:40 PM Niels Möller <nisse(a)lysator.liu.se> wrote:
> I gave it a test run on gcc112 in the gcc compile farm, and speedup of
> gcm update seems to be 26 times(!) compared to the C version.
>
That's reasonable, I got similar speedup on more stable POWER instances
than gcc compile farm.
> Where would that documentation be published? In the Nettle manual, as
> some IBM white paper, or as a more-or-less academic paper, e.g., on
> arxiv? I will not be able to spend much time on writing, but I'd be
> happy to review.
>
I'll start writing the papers once I got more details from IBM, similar to
intel documents, the document will be academic and practical at the same
time, I'll dive into finite field equations to demonstrate how we get there
as well as I'll add a practical example to clarify the preference of this
method in addition to the expected speedup of this method. My
intention that other crypto libraries could take advantage of this document
or maybe be a starting point for further improvements to the algorithm so
I'm checking if IBM would publish or approve such a document the same as
intel.
> I have a sketch of ARM Neon code doing the equivalent of two vpmsumd,
> with reasonable parallelism. Quite a lot of instructions needed.
>
If you don't have much time, you can send it here and I'll continue from
that point. I'm planning to compare the new method with the usual method
with and without the karatsuba algorithm.
> +C Alignment of gcm_key table elements, which is declared in gcm.h
> > +define(`TableElemAlign', `0x100')
>
> I still find this large constant puzzling. If I try
>
> struct gcm_key key;
> printf("sizeof (key): %zd, sizeof(key.h[0]): %zd\n", sizeof(key),
> sizeof(key.h[0]));
>
> (I added it to the start of test_main in gcm-test.c) and run on the
> gcc112 machine, I get
>
> sizeof (key): 4096, sizeof(key.h[0]): 16
>
> Which is what I'd expect, with elements of size 16 bytes, not 256 bytes.
>
> I haven't yet had the time to read the code carefully.
>
You see, the alignment of each element is 0x100 (256). The table has 16
elements and you got the size of the table 4096 which is reasonable because
16*256=4096
regards,
Mamone