Nikos Mavrogiannopoulos nmav@redhat.com writes:
It seems that ctr_crypt16() would not handle the whole input and that was complicating things. I've modified it towards that, and added the parameter. I did a gcm_fill(), but I didn't see the need for the nettle_block16 update, as the version I did (quite simplistic), didn't seem to differ in performance comparing to ctr_fill16.
I've applied first part with some reorganization. ctr-internal.h now declares
/* Fill BUFFER (n blocks) with incrementing CTR values. It would be nice if CTR was always 64-bit aligned, but it isn't when called from ctr_crypt. */ typedef void nettle_fill16_func(uint8_t *ctr, size_t n, union nettle_block16 *buffer);
void _ctr_crypt16(const void *ctx, nettle_cipher_func *f, nettle_fill16_func *fill, uint8_t *ctr, size_t length, uint8_t *dst, const uint8_t *src);
And I moved the implementation to a separate file ctr16.c.
Your change to gcm.c is then applied almost unchanged on top of that. Result pushed to a branch named "gcm-ctr-opt". On my machine, it gives a gcm_aes128 speedup of 54% (from 12.2 cycles/byte to 7.9).
Very nice! Needs a little testing on big-endian before merge to master.
Thanks, /Niels