On Wed, 2019-03-20 at 14:46 +0100, Niels Möller wrote:
Simo Sorce simo@redhat.com writes:
On Wed, 2019-03-20 at 06:14 +0100, Niels Möller wrote:
And another possible trick for big-endian is to do an "opposite-endian" left shift as
((x & 0x7f7f7f7f7f7f7f7f) << 1) | ((x & 0x8080808080808080) >> 15) where this bit is the carry out ^
This would allow us to avoid copies at the cost of more complicated code.
Which do you prefer? using endian.h where available? Or having two separate codepaths depending on the endianess of the machine ?
If it matters for performance, use the fastest variant. Using separate implementations of xts_shift, with #if:s depending on endianness and compiler support, is fine.
I'd expect the opposite-endian shift to be more efficient when bswap is particularly slow, and implemented in terms of shifting and masking.
A bit difficult to determine, though. Neither existence of endian.h macros or __builtin_bswap64 implies that the byte swapping is cheap. Are there any interesting platforms these days that lack an efficient bswap instruction? And are big-endian? Does mips have a bswap instruction?
In the end I went with the opposit-endian swapping solution and two separate implementations for LE and BE. My reasoning is that the compiler can definitely better optimize the LE version and the BE version done this way is probably not slower than using byteswapping even when optimized bswap is available. The secondary reason is that I feel this version is more readable.
I am attaching all 3 patches anew as I also fixed the other issues you mentioned in a previous email. Namely improved the non-stealing case for encryption/decryption by removing the duplicate last block handling, and changed the memclearing memxor with a memset.
HTH, Simo.