Re: Micro optimizations of the umac context structs

16 Apr 2013


      Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
...
Would it make sense to force allocation of the context (i.e., no context on
the stack) via ctx_alloc() function that will use posix_memalign or
memalign?
I don't think so. That would be a departure from how Nettle's interfaces
currently work, "no memory allcoation".
As far as I understand, if we could tell the compiler that the structure
must be 16-byte aligned, then it should arrange that also for stack
allocated objects.
But maybe it won't be reliable. For example,
struct some_ctx *ctx = alloca (sizeof(*ctx));
is a valid use, which depends on what alignment for alloca provides. Not
sure exactly how that would work, but the ABI typically specifies
required alignment of the stack pointer, and I suspect that (i) alloca
won't round to a larger alignment than that, and (ii) the ABIs or
relevant platforms is unlikely to specify larger alignment than 8 bytes.
Another ugly alternative would be to allocate one or a few extra
elements and align manually, something like
uint32_t a[SIZE + 3];
#define ALIGNED_A ((uint32t_*)(((ptrdiff_t) a + 15) & -16))
But that's *too* ugly, I think.
And I'm not sure how much difference to performance it would really
make. I guess it's not worth doing unless there's a large demonstraded
gain in performance.
(And umac is not the only case where the x86 assembly files use
movups/movupd where I'd prefer to use movaps/movapd).
Regards,
/Niels
-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: Micro optimizations of the umac context structs