S390x other modes and memxor (was: Re: [S390x] Optimize AES modes)

9 May 2021


      Maamoun TK maamoun.tk@googlemail.com writes:
...
On Sat, May 1, 2021 at 6:11 PM Niels Möller nisse@lysator.liu.se wrote:
...
Is https://git.lysator.liu.se/nettle/nettle/-/merge_requests/23 still
the current code?
I've added the basic AES-192 and AES-256 too since there is no problem to
test them all together.
Merged to the s390x branch now. Thanks for your patience.
For further improvement, it would be nice to have aesN_set_encrypt_key
and aesN_set_decrypt_key be two entrypoints to the same function. But
will make the file replacement logic a bit more complex.
And maybe the public aes*_invert_key functions should be marked
as deprecated (and deleted, next time we have an abi break)? No other
ciphers in Nettle have this feature, and it's not that useful for
applications. From codesearch.debian.net, it looks like they are exposed
by the haskell and rust bindings, though.
...
For the other the modes,
Before doing the other modes, do you think you could investigate if
memxor and memxor3 can be sped up? That should benefit many ciphers
and modes, and give more relevant speedup numbers for specialized
functions like aes cbc and aes ctr.
The best strategy depends on whether or not unaligned memory access is
possible and efficient. All current implementations do aligned writes to
the destination area (and smaller writes if needed at the edges). For the
C implementation and several of the asm implementations, they also do
aligned reads, and use shifting to get inputs xored together at the right
places.
While the x86_64 implementation uses unaligned reads, since that seems
as efficient, and reduces complexity quite a lot.
On all platforms I'm familiar with, assembly implementations can assume
that it is safe to read a few bytes outside the edge of the input
buffer, as long as those reads don't cross a word boundary
(corresponding to valgrind option --partial-loads-ok=yes).
Ideally, memxor performance should be limited by memory/cache bandwidth
(with data in L1 cache probably being the most important case. It looks
like nettle-benchmark calls it with a size of 10 KB).
Note that memxor3 must process data in descending address order, to
support the call from cbc_decrypt, with overlapping operands.
Regards,
/Niels
-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

S390x other modes and memxor (was: Re: [S390x] Optimize AES modes)