[WIP] aes arm asm from libgcrypt

16 Mar 2019


      On raspberry pi 3b+ (cortex-a53 @ 1.4GHz):
Before:
 aes128         |  nanosecs/byte   mebibytes/sec   cycles/byte
        ECB enc |     39.58 ns/B     24.10 MiB/s         - c/B
        ECB dec |     39.57 ns/B     24.10 MiB/s         - c/B
After:
        ECB enc |     15.24 ns/B     62.57 MiB/s         - c/B
        ECB dec |     15.68 ns/B     60.80 MiB/s         - c/B
Passes nettle regression test (only little-endian though)
Does not use pre-rotated tables (as in AES_SMALL), so reduces d-cache
footprint from 4.25K to 1K (enc)/1.25K (dec);
completely unrolled, so increases i-cache footprint
from 948b to 4416b (enc)/4032b (dec)
As it completely replaces current implementation, I just attached new
files (will post final version as a patch).
P.S. Yes, I tried convert macros to m4: complete failure (no named
parameters, problems with more than 9 arguments, weird expansion rules);
so I fallen back to good ol' gas. Sorry.
P.P.S. with this change, gcm/neon and (to-be-publushed) chacha_blocks/neon,
gnutls-cli --benchmark-ciphers:
Before:
Checking cipher-MAC combinations, payload size: 16384
             AES-128-GCM 13.56 MB/sec
       CHACHA20-POLY1305 68.26 MB/sec
        AES-128-CBC-SHA1 16.72 MB/sec
        AES-128-CBC-SHA256 15.07 MB/sec
After:
             AES-128-GCM 35.32 MB/sec
       CHACHA20-POLY1305 94.94 MB/sec
        AES-128-CBC-SHA1 27.53 MB/sec
        AES-128-CBC-SHA256 23.30 MB/sec

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[WIP] aes arm asm from libgcrypt