On Thu, Nov 12, 2020 at 07:45:14PM +0200, Maamoun TK wrote:
---------- Forwarded message --------- From: Maamoun TK maamoun.tk@googlemail.com Date: Thu, Nov 12, 2020 at 7:42 PM Subject: Re: [PowerPC] GCM optimization To: Niels Möller nisse@lysator.liu.se
On Thu, Nov 12, 2020 at 6:40 PM Niels Möller nisse@lysator.liu.se wrote:
I gave it a test run on gcc112 in the gcc compile farm, and speedup of gcm update seems to be 26 times(!) compared to the C version.
That's reasonable, I got similar speedup on more stable POWER instances than gcc compile farm.
Where would that documentation be published? In the Nettle manual, as some IBM white paper, or as a more-or-less academic paper, e.g., on arxiv? I will not be able to spend much time on writing, but I'd be happy to review.
I'll start writing the papers once I got more details from IBM, similar to intel documents, the document will be academic and practical at the same
Hi Mamone,
What do you need from the IBM side? I may be able to help. We'd definitely like to support you and Niels in publishing your results.
time, I'll dive into finite field equations to demonstrate how we get there as well as I'll add a practical example to clarify the preference of this method in addition to the expected speedup of this method. My intention that other crypto libraries could take advantage of this document or maybe be a starting point for further improvements to the algorithm so I'm checking if IBM would publish or approve such a document the same as intel.
I have a sketch of ARM Neon code doing the equivalent of two vpmsumd, with reasonable parallelism. Quite a lot of instructions needed.
If you don't have much time, you can send it here and I'll continue from that point. I'm planning to compare the new method with the usual method with and without the karatsuba algorithm.
+C Alignment of gcm_key table elements, which is declared in gcm.h
+define(`TableElemAlign', `0x100')
I still find this large constant puzzling. If I try
struct gcm_key key; printf("sizeof (key): %zd, sizeof(key.h[0]): %zd\n", sizeof(key), sizeof(key.h[0]));
(I added it to the start of test_main in gcm-test.c) and run on the gcc112 machine, I get
sizeof (key): 4096, sizeof(key.h[0]): 16
Which is what I'd expect, with elements of size 16 bytes, not 256 bytes.
I haven't yet had the time to read the code carefully.
You see, the alignment of each element is 0x100 (256). The table has 16 elements and you got the size of the table 4096 which is reasonable because 16*256=4096
regards, Mamone _______________________________________________ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs