I've merged a reorganization of the x86_64 aesni code to the master-updates branch for testing. This replaces the x86_64/aesni/aes-*crypt-internal.asm files with separate files for the different key sizes, as has been discussed earlier.
And I've implemented 2-way interleaving, i.e., doing 2 blocks at a time, which gave a nice speedup on the order of 15% in my tests. I may be worthwhile to go to 3-way or 4-way, but I don't plan to try that soon.
Regards, /Niels
On Thu, Sep 2, 2021 at 7:48 PM Niels Möller nisse@lysator.liu.se wrote:
I've merged a reorganization of the x86_64 aesni code to the master-updates branch for testing. This replaces the x86_64/aesni/aes-*crypt-internal.asm files with separate files for the different key sizes, as has been discussed earlier.
And I've implemented 2-way interleaving, i.e., doing 2 blocks at a time, which gave a nice speedup on the order of 15% in my tests. I may be worthwhile to go to 3-way or 4-way, but I don't plan to try that soon.
Great speedup! I tweaked the implementation to do 4-way interleaving but it seems has no performance benefits over the 2-way interleaving by running the benchmark on my machine with Intel Comet Lake architecture.
regards, Mamone
nettle-bugs@lists.lysator.liu.se