Hi Niels, attached find two patches, comments inline.
On Tue, 2019-03-19 at 07:31 +0100, Niels Möller wrote:
Simo Sorce simo@redhat.com writes:
New patch attached, the diff has also been applied as an additional commit to my xts_exploded tree in gitlab.
Thanks. Looks pretty good, a few more comments below.
Id din't look so closely at the tests, but it would be good with to test inplace opertion also with the aes-specific functions.
For performance (which we don't need to focus that much on for the initial version), one thing to do would be to fill a buffer with consecutive T_k values, and do the encrypt/decrypt and memxor operations on several blocks at a time.
+For a plaintext length that is a perfect multiple of the XTS block size: +@example +T_1 = E_k2(IV) MUL a^0 +C_1 = E_k1(P_1 XOR T_1) XOR T_1
+@dots{}
+T_n = E_k2(IV) MUL a^(n-1) +C_n = E_k1(P_n XOR T_n) XOR T_n +@end example
Nit: I'd write the T updates as
T_1 = E_k2(IV) T_k = T_(k-1) MUL a
since this is how to implement it, and no powering operation is needed.
Ok changed it.
+The functions @var{encf} @var{decf} are of type
+@code{void f (void *@var{ctx}, size_t @var{length}, uint8_t *@var{dst}, +const uint8_t *@var{src})},
It seems the manual doesn't really have an entry for the nettle_cipher_func type. Maybe it should. Anyway, the type for the first argument is _const_ void *.
Fixed.
+/* shift one and XOR with 0x87. */ +/* src and dest can point to the same buffer for in-place operations */ +static void +xts_shift(union nettle_block16 *dst,
const union nettle_block16 *src)
+{
- uint8_t carry = src->b[15] >> 7;
- uint64_t b0 = LE_READ_UINT64(src->b);
- uint64_t b1 = LE_READ_UINT64(src->b+8);
- b1 = (b1 << 1) | (b0 >> 63);
- b0 = b0 << 1;
- LE_WRITE_UINT64(dst->b, b0);
- LE_WRITE_UINT64(dst->b+8, b1);
- dst->b[0] ^= 0x87 & -carry;
+}
Looks right. It can be optimized later on.
Just for curiosity, would it be ok change, LE_READ_UINT64/LE_WRITE_UINT64 to use uint64_t and have the macro use __bswap64 if available or fall back to the original unoptimized code ?
(__bswap64 already knows how to use __builtin_bswap64 or falls back to similar code as in LE_READ_UINT64)
Or do we need a new macro that expects aligned input/output and leave the existing macro alone, to be used for unaligned input/output ?
+/*
- prev is the block to steal from
- curr is the input block to the last step
- length is the partial block length
- dst is the destination partial block
- src is the source partial block
- In the Encryption case:
- prev -> the output of the N-1 encryption step
- curr -> the input to the Nth step (will be encrypted as Cn-1)
- dst -> the final Cn partial block
- src -> the final Pn partial block
- In the decryption case:
- prev -> the output of the N-1 decryption step
- curr -> the input to the Nth step (will be decrypted as Pn-1)
- dst -> the final Pn partial block
- src -> the final Cn partial block
- */
+static void +xts_steal(uint8_t *prev, uint8_t *curr,
size_t length, uint8_t *dst, const uint8_t *src)
+{
- /* copy the remaining in the current input block */
- memcpy(curr, src, length);
- /* fill the current block with the last blocksize - length
- bytes of the previous block */
- memcpy(&curr[length], &prev[length], XTS_BLOCK_SIZE - length);
- /* This must be last or inplace operations will break
- copy 'length' bytes of the previous block in the
- destination block, which is the final partial block
- returned to the caller */
- memcpy(dst, prev, length);
+}
This is a bit confusing; first we write to the curr block, using data from src and prev. And next we copy from prev to dst; is that the swapping of the last two blocks? Maybe this could be made simpler, and a copy eliminatedif the main loops were
for (;length >= 2*XTS_BLOCK_SIZE;
For the arguments, it looks like prev could be const, and curr could be nettle_block16 (but changing the latter is perhaps a bit pointless, since we only treat it as a byte array with no obvious benefits from the alignment).
Ok, I took a stab at removing xts_steal completely in the second patch, let me know what you think, I think I may like it better than my original code and uses nettle_block16 for temporary storage to avoid a copy.
+static void +check_length(size_t length, uint8_t *dst) +{
- assert(length >= XTS_BLOCK_SIZE);
- /* asserts may be compiled out, try to save the user by zeroing the dst in
- case the buffer contains sensitive data (like the clear text for inplace
- encryption) */
- if (length < XTS_BLOCK_SIZE)
- memxor(dst, dst, length);
+}
+/* works also for inplace encryption/decryption */
+void +xts_encrypt_message(const void *enc_ctx, const void *twk_ctx,
nettle_cipher_func *encf,
const uint8_t *tweak, size_t length,
uint8_t *dst, const uint8_t *src)
+{
- union nettle_block16 T;
- union nettle_block16 P;
- check_length(length, dst);
- encf(twk_ctx, XTS_BLOCK_SIZE, T.b, tweak);
- /* the zeroth power of alpha is the initial ciphertext value itself, so we
- skip shifting and do it at the end of each block operation instead */
- for (;length >= XTS_BLOCK_SIZE;
length -= XTS_BLOCK_SIZE, src += XTS_BLOCK_SIZE, dst += XTS_BLOCK_SIZE)
- {
memcpy(P.b, src, XTS_BLOCK_SIZE);
memxor(P.b, T.b, XTS_BLOCK_SIZE); /* P -> PP */
This is what memxor3 is for. (If it's more efficient in this case is not entirely obvious, though).
I did not know about memxor3, it makes the code better, esp in the second patch, I think.
+/* XTS Mode with AES-128 */ +struct xts_aes128_ctx {
- struct aes128_ctx cipher;
- struct aes128_ctx tweak_cipher;
+};
Could consider renaming it to xts_aes128_key, somewhat analogous to struct eax_key and struct gcm_key. This represents message-independent data, and then the name xts_aes128_ctx could be used in case we add an interface with a buffer and incremental encryption and decryption. Not clear to me if that's likely or not to ever be useful, though.
I did this, the reasoning makes sense to me.
HTH, Simo.