Maamoun TK maamoun.tk@googlemail.com writes:
Thanks for the update. This is quite complex for me. I have not yet read the code very carefully. I think I'd like to focus on the main function, gcm_hash, first. Some questions and suggestions, to make it easier:
1. Take out the fat support to it's own patch.
2. You could consider doing the init_key in C, if nothing else as documentation. It could be either under some #ifdef in gcm.c, or a separate .c file under powerpc64/p8/, next to gcm-hash.asm. Maybe it's still a good idea to have it in assembly, that's a tradeoff that depends a bit on the complexity of both the C and assembly, and the speedup from doing it in assembly. And I don't have a very strong opinion on this point now.
Even with asm, it might be a bit clearer to move it to its own .asm file, so each file can use define only for the relevant registers for that function.
3. What's TableElemAlign? Assuming GCM_TABLE_BITS is 8 (current Nettle ABI), you can treat struct gcm_key as a blob of size 4096 bytes, with alignment corresponding to what the C compiler uses for uint64_t. Are you using some padding at the start (depending on address) to ensure you get stronger alignment? And 256 byte alignment sounds a bit more than needed?
4. Please document the layout used for the precomputed values stored in struct gcm_key.
5. It would help with comments explaining the naming convention used for the named registers, and the instruction sequence used for a single Karatsuba multiplication, with any needed comments.
6. Is 8-way unrolling really necessary to get full utilization of the execution units? And it's also not yet clear to me what 8-way means, is that 8 blocks of 16 bytes each (i.e., 128 bytes input), or 8 input bytes?
7. Do you need any bit reversal? As you have mentioned, the multiplication operation is symmetric under bit reversal, so ideally bit reversal should be needed at most when setting the key and extracting the digest, but not by the main workhorse, gcm_hash.
I know you have referenced articles for the used algorithm, but it would be helpful to have the main ideas in comments close to the code.
Regards, /Niels