On 03/15/19 16:33 , Simo Sorce wrote:
On Fri, 2019-03-15 at 09:27 -0400, Simo Sorce wrote:
I think this is the same as block_mulx, in cmac.c. (Also same byte order, right?)
Looks the same indeed, should I share it? Just copy it from cmac? Something else?
Turns out the algorithm is not equivalent, as the shift is applied to the array as if it were a big 128bit little endian value, the endianess of the two is different.
I changed the implementation to a much simpler form that show the difference:
/* shift one and XOR with 0x87. */ /* src and dest can point to the same buffer for in-place operations */ static void xts_shift(union nettle_block16 *dst, const union nettle_block16 *src) { uint8_t carry = src->b[15] >> 7; dst->u64[1] = (src->u64[1] << 1) | (src->u64[0] >> 63); dst->u64[0] = src->u64[0] << 1; dst->b[0] ^= 0x87 & -carry;
Nitpick: mixing different-sized access (esp. writes) to same memory is problematic for modern cpus (it confuses speculative engine): uint64_t carry = src->u64[1] >> 63; dst->u64[1] = (src->u64[1] << 1) | (src->u64[0] >> 63); dst->u64[0] = src->u64[0] << 1; dst->u64[0] ^= 0x87 & -carry;
}