On Tue, Apr 16, 2013 at 1:08 PM, Niels Möller nisse@lysator.liu.se wrote:
Another ugly alternative would be to allocate one or a few extra elements and align manually, something like uint32_t a[SIZE + 3]; #define ALIGNED_A ((uint32t_*)(((ptrdiff_t) a + 15) & -16)) But that's *too* ugly, I think.
Indeed, from what I see I don't think there is a non-ugly solution to that problem :) If you want to ignore alignment you may provide aligned and unaligned versions of the functions and let the caller cope with the alignment means.
And I'm not sure how much difference to performance it would really make. I guess it's not worth doing unless there's a large demonstraded gain in performance.
The results will be very CPU-specific. If you have any benchmark or test code, I could test on i7 and amd 64 cpus.
regards, Nikos