Hi Niels,
Here is the new patch v4 for AES/GCM stitched implementation and benchmark based on the current repo.
Thanks. -Danny
On Jan 31, 2024, at 4:35 AM, Niels Möller nisse@lysator.liu.se wrote:
Niels Möller nisse@lysator.liu.se writes:
While the powerpc64 vncipher instruction really wants the original subkeys, not transformed. So on power, it would be better to have a _nettle_aes_invert that is essentially a memcpy, and then the aes decrypt assembly code could be reworked without the xors, and run at exactly the same speed as encryption.
I've tried this out, see branch https://git.lysator.liu.se/nettle/nettle/-/tree/ppc64-aes-invert . It appears to give the desired improvement in aes decrypt speed, making it run at the same speed as aes encrypt. Which is a speedup of about 80% when benchmarked on power10 (the cfarm120 machine).
Current _nettle_aes_invert also changes the order of the subkeys, with a FIXME comment suggesting that it would be better to update the order keys are accessed in the aes decryption functions.
I've merged the changes to keep subkey order the same for encrypt and decrypt (so that the decrypt round loop uses subkeys starting at the end of the array), which affects all aes implementations except s390x, which doesn't need any subkey expansion. But I've deleted the sparc32 assembly rather than updating it.
Regards, /Niels
-- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ nettle-bugs mailing list -- nettle-bugs@lists.lysator.liu.se To unsubscribe send an email to nettle-bugs-leave@lists.lysator.liu.se