On Fri, 2019-03-15 at 22:33 +0100, Niels Möller wrote:
Simo Sorce simo@redhat.com writes:
Turns out the algorithm is not equivalent, as the shift is applied to the array as if it were a big 128bit little endian value, the endianess of the two is different.
Ah, I see.
/* shift one and XOR with 0x87. */ /* src and dest can point to the same buffer for in-place operations */ static void xts_shift(union nettle_block16 *dst, const union nettle_block16 *src) { uint8_t carry = src->b[15] >> 7; dst->u64[1] = (src->u64[1] << 1) | (src->u64[0] >> 63); dst->u64[0] = src->u64[0] << 1; dst->b[0] ^= 0x87 & -carry; }
This will then work only on little-endian systems?
I think it would be nice with a structure like
b0 = src->u64[0]; b1 = src->u64[1]; /* Load inputs */ ... swap if big-endian ... uint64_t carry = (b1 >> 63); b1 = (b1 << 1) | (b0 >> 63) b0 = (b0 << 1) ^ (0x87 & -carry); ... swap if big-endian ... dst->u64[0] = b0; dst->u64[1] = b1; /* Store output */
I.e., no memory accesses smaller than 64-bits.
Possibly with load + swap and swap + store done with some system-dependent macros.
But it's not essential for a first version of xts; copying block_mulx and just replacing READ_UINT64 with LE_READ_UINT64 and similarly for WRITE would be ok for now. There are more places with potential for micro-optimizations related to endianness. While I think the READ/WRITE_UINT macros are adequate in most places where unaligned application data is read and written by C code.
I will add the macros to swap endianess, and resend a new version.
Simo.