Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
Would it make sense to force allocation of the context (i.e., no context on the stack) via ctx_alloc() function that will use posix_memalign or memalign?
I don't think so. That would be a departure from how Nettle's interfaces currently work, "no memory allcoation".
As far as I understand, if we could tell the compiler that the structure must be 16-byte aligned, then it should arrange that also for stack allocated objects.
But maybe it won't be reliable. For example,
struct some_ctx *ctx = alloca (sizeof(*ctx));
is a valid use, which depends on what alignment for alloca provides. Not sure exactly how that would work, but the ABI typically specifies required alignment of the stack pointer, and I suspect that (i) alloca won't round to a larger alignment than that, and (ii) the ABIs or relevant platforms is unlikely to specify larger alignment than 8 bytes.
Another ugly alternative would be to allocate one or a few extra elements and align manually, something like
uint32_t a[SIZE + 3];
#define ALIGNED_A ((uint32t_*)(((ptrdiff_t) a + 15) & -16))
But that's *too* ugly, I think.
And I'm not sure how much difference to performance it would really make. I guess it's not worth doing unless there's a large demonstraded gain in performance.
(And umac is not the only case where the x86 assembly files use movups/movupd where I'd prefer to use movaps/movapd).
Regards, /Niels