Re: Performance of AESNI impl vs other crypto libraries

9 Jan 2018


      nisse@lysator.liu.se (Niels Möller) writes:
...
I agree CTR seems more important. I'm guessing that the loop
 for (p = dst, left = length;
      left >= block_size;
      left -= block_size, p += block_size)
   {
     memcpy (p, ctr, block_size);
     INCREMENT(block_size, ctr);
   }


in ctr_crypt contribudes quite a few cycles per byte. It would be faster
to use an always word-aligned area, and do the copying and incrementing
using word operations (and final byteswap when running on a
little-endian platform), and with no intermediate stores.
I've tried this, with special code for block size 16. (Without any
assembly, but using __builtin_bswap64). Pushed to the ctr-opt branch.
Gives a nice speedup. On my machine:
Nettle-3.4:
Algorithm         mode Mbyte/s cycles/byte cycles/block
aes128  ECB encrypt 1589.75        1.26        20.16
            aes128  ECB decrypt 1642.91        1.22        19.50
            aes128  CBC encrypt  354.43        5.65        90.41
            aes128  CBC decrypt 1519.10        1.32        21.09
            aes128   (in-place) 1338.70        1.50        23.94
            aes128          CTR  727.24        2.75        44.06
            aes128   (in-place)  774.78        2.58        41.36
master branch:
Algorithm         mode Mbyte/s cycles/byte cycles/block
aes128  ECB encrypt 3143.18        0.64        10.19
            aes128  ECB decrypt 3159.88        0.63        10.14
            aes128  CBC encrypt  351.37        5.70        91.20
            aes128  CBC decrypt 2726.47        0.73        11.75
            aes128   (in-place) 2131.99        0.94        15.03
            aes128          CTR  970.08        2.06        33.03
            aes128   (in-place)  796.31        2.51        40.24
ctr-opt branch:
Algorithm         mode Mbyte/s cycles/byte cycles/block
aes128  ECB encrypt 3159.18        0.63        10.14
            aes128  ECB decrypt 3159.82        0.63        10.14
            aes128  CBC encrypt  351.80        5.69        91.08
            aes128  CBC decrypt 2723.80        0.74        11.76
            aes128   (in-place) 2156.27        0.93        14.86
            aes128          CTR 1778.84        1.13        18.01
            aes128   (in-place) 1550.39        1.29        20.67
Which means that aes128-ctr is twice as fast as in 3.4.
If anyone has a big-endian machine handy, it would be nice with
additional testing for both correctness and performance (I have access
to a few virtual machines with non-x86 architectures, where I can test
this before merging to the master branch, but that's not so useful for
benchmarking).
Regards,
/Niels
-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: Performance of AESNI impl vs other crypto libraries