Re: Performance of AESNI impl vs other crypto libraries

30 Jan 2018


      On Tue, 2018-01-09 at 09:17 +0100, Nikos Mavrogiannopoulos wrote:
...
...
...
in ctr_crypt contribudes quite a few cycles per byte. It would be
faster
to use an always word-aligned area, and do the copying and
incrementing
using word operations (and final byteswap when running on a
little-endian platform), and with no intermediate stores.
I've tried this, with special code for block size 16. (Without any
assembly, but using __builtin_bswap64). Pushed to the ctr-opt
branch.
Gives a nice speedup. On my machine:
I see a quite large speedup on my x86_64 too on CTR. Note however
that GCM performance is not affected.
To follow up on this, gcm would get an 8% (on my system) speedup by
switching gcm_crypt() with ctr_crypt(). With that change as is however,
 the 32-bit counter is replaced with an "unlimited" counter. Wouldn't
introducing an assert on decrypt and encrypt length be sufficient to
share that code?
regards,
Nikos

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: Performance of AESNI impl vs other crypto libraries