On Fri, 2019-03-15 at 13:14 +0100, Niels Möller wrote:
Simo Sorce simo@redhat.com writes:
The attached patch implements the XTS block cipher mode, as specified in IEEE P1619. The interface is split into a generic pair of functions for encryption and decryption and additional AES-128/AES-256 variants.
Thanks. Sorry for the late response.
The function signatures follows the same pattern used by other block- cipher modes like ctr, cfb, ccm, etc...
But it looks like one has to pass the complete message to one call?
Yes, due to ciphertext stealing, XTS needs to know what are the last two blocks, or at the very least needs to withhold the last processed block in order to be able to change it if a final partial block is provided. This means inputs and outputs would not be symmetrical and I felt it would make it somewhat hard to deal with as an API. In general XTS is used for block storage and the input is always fully available (and relatively small, either around 512 bytes, or 4k).
Other modes support incremental encryption (with the requirement that all calls but the last must be an integral number of blocks). I.e., calling sequence like
xts_aes128_set_key xts_aes128_set_iv xts_aes128_encrypt ... // 1 or more times xts_aes128_set_iv // Start new message xts_aes128_encrypt ... // 1 or more times
+The @code{n} plaintext blocks are transformed into @code{n} ciphertext blocks +@code{C_1},@dots{} @code{C_n} as follows.
+For a plaintext length that is a perfect multiple of the XTS block size: +@example +T_1 = E_k2(IV) MUL a^0 +C_1 = E_k1(P_1 XOR T_1) XOR T_1
+@dots{}
+T_n = E_k2(IV) MUL a^(n-1) +C_n = E_k1(P_n XOR T_n) XOR T_n +@end example
+For any other plaintext lengths: +@example +T_1 = E_k2(IV) MUL a^0 +C_1 = E_k1(P_1 XOR T_1) XOR T_1
+@dots{}
+T_(n-2) = E_k2(IV) MUL a^(n-3) +C_(n-2) = E_k1(P_(n-2) XOR T_(n-2)) XOR T_(n-2)
+T_(n-1) = E_k2(IV) MUL a^(n-2) +CC_(n-1) = E_k1(P_(n-1) XOR T_(n-1)) XOR T_(n-1)
+T_n = E_k2(IV) MUL a^(n-1) +PP = [1..m]Pn | [m+1..128]CC_(n-1) +C_(n-1) = E_k1(PP XOR T_n) XOR T_n
+C_n = [1..m]CC_(n-1) +@end example
So the second key, with E_k2, is only ever used to encrypt the IV? If you add a set_iv function, that could do this encryption and only store E_k2(IV).
What would be the advantage ? I guess it may make sense if we were to allow to call the encryption function multiple times, but as explained above I am not sure this is necessarily desirable. It may also risk misuse where people set the same IV for all encryption operations, that would be catastrophic, but probably can be handled by clearing the stored IV when the encryption is finalized.
--- /dev/null +++ b/xts.c @@ -0,0 +1,219 @@
[...]
+static void +xts_shift(uint8_t *T) +{
- uint8_t carry;
- uint8_t i;
- for (i = 0, carry = 0; i < XTS_BLOCK_SIZE; i++)
- {
uint8_t msb = T[i] & 0x80;
T[i] = T[i] << 1;
T[i] |= carry;
carry = msb >> 7;
- }
- if (carry)
- T[0] ^= 0x87;
+}
I think this is the same as block_mulx, in cmac.c. (Also same byte order, right?)
Looks the same indeed, should I share it? Just copy it from cmac? Something else?
Since the block size is fixed to 128 bits, I think it makes sense to use the nettle_block16 type for all blocks but the application's src and destination areas. Then we get proper alignment, and can easily use operations on larger units.
Ok.
BTW, for side-channel silence, we should change
if (carry) T[0] ^= 0x87;
to something like
T[0] ^= 0x87 & - carry;
(and similarly for the cmac version).
I can do it for xts.c, and provide a separate patch for cmac.c too, or use a common function for both and handle it there.
- fblen = length - (length % XTS_BLOCK_SIZE);
- XTSENC(twk_ctx, T, tweak);
- /* the zeroth power of alpha is the initial ciphertext value itself, so we
- skip shifting and do it at the end of each block operation instead */
- for (i = 0; i < fblen; i += XTS_BLOCK_SIZE)
- {
In other places, loops like this are often written as
for (; length >= BLOCK_SIZE; length -= BLOCK_SIZE, src += BLOCK_SIZE, dst += BLOCK_SIZE)
Then there's no need for the up-front division length & BLOCK_SIZE. Doesn't matter much in this case, since the block size is a constant power of two, but in general, division is quite expensive.
Ok, I can change that.
C = &dst[i];
XTSCPY(P, &src[i]);
XTSXOR(P, T); /* P -> PP */
XTSENC(enc_ctx, C, P); /* CC */
XTSXOR(C, T); /* CC -> C */
I think it would be clearer with encf being an explicit argument to the macros that need it (or maybe do it without the macros, if they expand to only a single call each).
Ok, will drop the macros, they seemed clearer, but now that I am rereding the code I found myself looking at their implementation more often than I thought necessary.
Simo.