Re: Performance of AESNI impl vs other crypto libraries

9 Jan 2018

      On Tue, 2018-01-09 at 08:29 +0100, Niels Möller wrote:
...
nisse@lysator.liu.se (Niels Möller) writes:
...
I agree CTR seems more important. I'm guessing that the loop
 for (p = dst, left = length;
      left >= block_size;
      left -= block_size, p += block_size)
   {
     memcpy (p, ctr, block_size);
     INCREMENT(block_size, ctr);
   }

in ctr_crypt contribudes quite a few cycles per byte. It would be
faster
to use an always word-aligned area, and do the copying and
incrementing
using word operations (and final byteswap when running on a
little-endian platform), and with no intermediate stores.
I've tried this, with special code for block size 16. (Without any
assembly, but using __builtin_bswap64). Pushed to the ctr-opt branch.
Gives a nice speedup. On my machine:
I see a quite large speedup on my x86_64 too on CTR. Note however that
GCM performance is not affected.
regards,
Nikos

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: Performance of AESNI impl vs other crypto libraries