Re: [PATCH] "PowerPC64" GCM support

4 Oct 2020

      Maamoun TK maamoun.tk@googlemail.com writes:
...
Done! I will post the assembly part first so you can review it.
Thanks. I hope to get the time to read it carefully soon.
...

init_key() and gcm_hash() functions are connected to each other through

a shared table, it makes it easier to modify the implementation if both are
written in the same way.
2. we have to use intrinsics for certain operations like 'vpmsumd',
furthermore '__builtin_crypto_vpmsumd' is buggy on certain versions of GCC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91275 and has different name
on CLANG '__builtin_altivec_crypto_vpmsumd' so we will end up using a lot
of conditions to check the variant of compiler plus writing inline assembly
code for 'vpmsumd' in case the variant has intrinsic issue with it.
I still prefer to have both functions in the same file, I separated the
'define' macros for each function so each function has its own define
section above its prologue.
I see. And I wouldn't want C code with machine-specific or
compiler-specific intrinsics. If there's no reasonable way to do it in
portable C, let's stick to assembly.
...

What's TableElemAlign? Assuming GCM_TABLE_BITS is 8 (current Nettle

...
ABI), you can treat struct gcm_key as a blob of size 4096 bytes, with
   alignment corresponding to what the C compiler uses for uint64_t. Are
   you using some padding at the start (depending on address) to ensure
   you get stronger alignment? And 256 byte alignment sounds a bit more
   than needed?
The compiler aligns each element of gcm_key array at 0x100 perhaps because
the struct is declared as union so for example if I want to get the 'H'
value that is assigned into the 9th index, I have to add 0x800 to the array
address to get that value.
That's highly unexpected! It makes struct gcm_key 16 times larger than
intended, 64 KByte rather than 4KByte, which seems pretty bad. I would
expect more or less any C compiler to use size 16 and 8 byte minimum
alignment for the elements (and I'd wish there were a nice and portable
way to enforce minimum 16 byte alignment). Can you double check, and try
to find an explanation for this?
...
I would like to explain more about 'vpmsumd' instruction, in x86 arch the
'pclmulqdq' instruction is used for carry-less operations. To use
'pclmulqdq' an immediate value should be passed to the third parameter of
the instruction to specify which doublewords will be multiplied. However,
'vpmsumd' do the following operation:
(High-order doubleword of the second parameter * High-order doubleword of
the third parameter) XOR (Low-order doubleword of the second parameter *
Low-order doubleword of the third parameter)
Interesting! Do you use inputs where one doubleword is zero, making one
or the other of the xored values be zero, or is there some clever way to
take advantage of the buiiltin wraparound? I guess one can also do some
interesting things of other selected parts of the inputs zero, for
example, the middle word of one of the operands, or all odd-numbered
bits, or...
Regards,
/Niels
-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [PATCH] "PowerPC64" GCM support