Re: [PATCH] "PowerPC64" GCM support

3 Oct 2020

      ...

Take out the fat support to it's own patch.

Done! I will post the assembly part first so you can review it.
2. You could consider doing the init_key in C, if nothing else as
...
documentation. It could be either under some #ifdef in gcm.c, or a
   separate .c file under powerpc64/p8/, next to gcm-hash.asm. Maybe
   it's still a good idea to have it in assembly, that's a tradeoff that
   depends a bit on the complexity of both the C and assembly, and the
   speedup from doing it in assembly. And I don't have a very strong
   opinion on this point now.
Even with asm, it might be a bit clearer to move it to its own .asm
   file, so each file can use define only for the relevant registers for
   that function.
Writing init_key() in C for PowerPC implementation has a couple of
disadvantages:
1. init_key() and gcm_hash() functions are connected to each other through
a shared table, it makes it easier to modify the implementation if both are
written in the same way.
2. we have to use intrinsics for certain operations like 'vpmsumd',
furthermore '__builtin_crypto_vpmsumd' is buggy on certain versions of GCC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 and has different name
on CLANG '__builtin_altivec_crypto_vpmsumd' so we will end up using a lot
of conditions to check the variant of compiler plus writing inline assembly
code for 'vpmsumd' in case the variant has intrinsic issue with it.
I still prefer to have both functions in the same file, I separated the
'define' macros for each function so each function has its own define
section above its prologue.
3. What's TableElemAlign? Assuming GCM_TABLE_BITS is 8 (current Nettle
...
ABI), you can treat struct gcm_key as a blob of size 4096 bytes, with
   alignment corresponding to what the C compiler uses for uint64_t. Are
   you using some padding at the start (depending on address) to ensure
   you get stronger alignment? And 256 byte alignment sounds a bit more
   than needed?
The compiler aligns each element of gcm_key array at 0x100 perhaps because
the struct is declared as union so for example if I want to get the 'H'
value that is assigned into the 9th index, I have to add 0x800 to the array
address to get that value.
...

Please document the layout used for the precomputed values stored in
struct gcm_key.

Done!
5. It would help with comments explaining the naming convention used for
...
the named registers, and the instruction sequence used for a single
   Karatsuba multiplication, with any needed comments.
I tried to make a reader-friendly version of the implementation to make it
clearer with appropriate comments. Let me know if the documentation still
looks sketchy.
6. Is 8-way unrolling really necessary to get full utilization of the
...
execution units? And it's also not yet clear to me what 8-way means,
   is that 8 blocks of 16 bytes each (i.e., 128 bytes input), or 8 input
   bytes?
processing 4 blocks is enough to saturate the execution units but
processing 8 blocks (each block is 128-bit) cut the times of reduction
procedure execution by half compared to processing 4 blocks per loop so it
performs better, however in order to facilitate a review of implementation
I downed the loop to process 4 blocks to make it more clear.
...

Do you need any bit reversal? As you have mentioned, the
multiplication operation is symmetric under bit reversal, so ideally
bit reversal should be needed at most when setting the key and
extracting the digest, but not by the main workhorse, gcm_hash.

You got it. If the bit-reflection of the key is handled, there is no need
to handle the bit-reflection of the multiplication product with that key.
So any upcoming multiplication will be bit-reversal-free.
I would like to explain more about 'vpmsumd' instruction, in x86 arch the
'pclmulqdq' instruction is used for carry-less operations. To use
'pclmulqdq' an immediate value should be passed to the third parameter of
the instruction to specify which doublewords will be multiplied. However,
'vpmsumd' do the following operation:
(High-order doubleword of the second parameter * High-order doubleword of
the third parameter) XOR (Low-order doubleword of the second parameter *
Low-order doubleword of the third parameter)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [PATCH] "PowerPC64" GCM support