Re: [WIP] aes arm asm from libgcrypt

24 Mar 2019


      "Yuriy M. Kaminskiy" yumkam@gmail.com writes:
I've had another look, trying to understand how it differs.
...
Does not use pre-rotated tables (as in AES_SMALL), so reduces d-cache
footprint from 4.25K to 1K (enc)/1.25K (dec);
completely unrolled, so increases i-cache footprint
from 948b to 4416b (enc)/4032b (dec)
Not sure unrolling is that beneficial; Nettle's implementation does two
rounds at a time (since just like in your patch, src and destination
registers alternate when doing a round), and that's so many instructions
that lop iverhead should be pretty small.
...
As it completely replaces current implementation, I just attached new
files (will post final version as a patch).
As you say, it doesn't use prerotated tables, but instead adds a , ror
#x to the relevant eor instructions.
Load and store of the cleartext and ciphertext bytes is different (and I
have some difficulty following it).
Masking to get table indices is the same as in nettle's
arm/aes-encrypt-internal.asm, while nettle's v6 code uses the uxtb
instruction, which saves one register (which the code doesn't take much
advantage of, though).
The code in your patch has more careful instruction scheduling, e.g.,
interleaving addition of roundkeys with the sbox table lookups. Nettle's
code is written with only a single temporary register used for
everything, which makes it impossible to interleave independent parts of
the mangling. While your patch seems to alternate between three
different temporaries.
Regards,
/Niels
-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [WIP] aes arm asm from libgcrypt