Maamoun TK maamoun.tk@googlemail.com writes:
This is a stand-alone patch that applies all the previous patches to the optimized GCM implementation. This patch is based on the master upstream so it can be merged directly.
Some questions on the overall structure:
What's the speedup you get from assembly gcm_fill? I see the C implementation uses memcpy and WRITE_UINT32, and is likely significantly slower than the ctr_fill16 in ctr.c. But it could be improved using portable means. If done well, it should be a very small fraction of the cpu time spent for gcm encryption.
What table layout is used by the assembly gcm_init_key? I would have expected not much tables to be needed when using the special multiply instructions. And Nettle generally doesn't try very hard to optimize key setup, under the theory that applications that need high performance also do a lot of work with each key. E.g., we use C code for AES key setup even when it could be sped up by assembly using special instructions.
So it would be easier for me to start with a patch for gcm_hash only (possibly with supporting C code for key setup).
Regards, /Niels