Hi, The attached patch brings support for AES-128-CMAC. The code is based on the samba code. The rshift and lshift functions come from the AES implementation bundled with samba.
regards, Nikos
Nikos Mavrogiannopoulos nmav@redhat.com writes:
The attached patch brings support for AES-128-CMAC. The code is based on the samba code.
Nice. I think I'll have to read up a bit to understand what it's doing.
The rshift and lshift functions come from the AES implementation bundled with samba.
These to just a single bit right or left shift of a 16-byte block? To me it seems odd to do that with a table lookup for each byte.
May be natural as functions operating on nettle_block16, and if platform endianness is right, could do the shifts in units of unsigned long or uint64_t. Shift and mask on 64-bit values may be preferable also if the endiannness is wrong.
Regards, /Niels
On Wed, 2018-01-10 at 11:24 +0100, Niels Möller wrote:
Nikos Mavrogiannopoulos nmav@redhat.com writes:
The attached patch brings support for AES-128-CMAC. The code is based on the samba code.
Nice. I think I'll have to read up a bit to understand what it's doing.
The rshift and lshift functions come from the AES implementation bundled with samba.
These to just a single bit right or left shift of a 16-byte block? To me it seems odd to do that with a table lookup for each byte.
Indeed, and I don't see any obvious benefit of that code. I've replaced it with a simpler version.
May be natural as functions operating on nettle_block16, and if platform endianness is right, could do the shifts in units of unsigned long or uint64_t. Shift and mask on 64-bit values may be preferable also if the endiannness is wrong.
I've now used the nettle_block16 to ensure values are aligned. I didn't try to optimize the shift as it is only used on set_key which doesn't really affect performance.
regards, Nikos
On Fri, 2018-01-12 at 11:51 +0100, Nikos Mavrogiannopoulos wrote:
On Wed, 2018-01-10 at 11:24 +0100, Niels Möller wrote:
Nikos Mavrogiannopoulos nmav@redhat.com writes:
The attached patch brings support for AES-128-CMAC. The code is based on the samba code.
Nice. I think I'll have to read up a bit to understand what it's doing.
The rshift and lshift functions come from the AES implementation bundled with samba.
These to just a single bit right or left shift of a 16-byte block? To me it seems odd to do that with a table lookup for each byte.
Indeed, and I don't see any obvious benefit of that code. I've replaced it with a simpler version.
Re-sending as it seems I forgot to remove cmac-internal from makefile.
regards, Nikos
Nikos Mavrogiannopoulos nmav@redhat.com writes:
Re-sending as it seems I forgot to remove cmac-internal from makefile.
I've had a first reading, and a few comments.
diff --git a/cmac.c b/cmac.c new file mode 100644 index 00000000..b4886808 --- /dev/null +++ b/cmac.c
+static const uint8_t const_Rb[] = {
- 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
- 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x87
+};
+static inline void block_lshift(const union nettle_block16 *in, + union nettle_block16 *out) +{
- uint64_t b1 = READ_UINT64(in->b);
- uint64_t b2 = READ_UINT64(in->b+8);
- unsigned overflow = b2 & 0x8000000000000000;
- b1 <<= 1;
- b2 <<= 1;
- if (overflow)
b1 |= 0x01;
I'd write the shift as
b1 = (b1 << 1) | (b2 >> 63); b2 <<= 1;
I think it could return the bit shifted out (since the next thing done is examining that bit). Or maybe even better, move the
if (one-bit shifted out) { memxor const_Rb }
into this function, which would then correspond to multiplication by in Z_2[x] / (Rb polynomial). Nettle conventions would also put the destination argument first. I'd suggest rewriting the function as
static void block_mulx(union nettle_block16 *dst, const union nettle_block16 *src) { uint64_t s1 = READ_UINT64(src->b); uint64_t s0 = READ_UINT64(src->b+8); uint64_t d1 = (s1 << 1) | (s0 >> 63); uint64_t d0 = s0 << 1; if (s1 >> 63) d0 ^= 0x87; /* If that's what the const_Rb memxor boils down to? */
WRITE_UINT64(dst->b, d1); WRITE_UINT64(dst->b+8, d0); }
Further cleverness is possible, but further optimization might not be needed, since, as you say, it's for key setup only.
+void cmac128_set_key(struct cmac128 *ctx, void *key,
nettle_cipher_func *encrypt)
+{
- memset(ctx, 0, sizeof(*ctx));
- /* step 1 - generate subkeys k1 and k2 */
- encrypt(key, 16, ctx->L.b, const_Zero);
- block_lshift(ctx->L.b, ctx->K1.b);
- if (ctx->L.b[0] & 0x80) {
memxor(ctx->K1.b, const_Rb, 16);
- }
Is ctx->L used for anything after key setup? If not, it should be moved to a local variable, out of the context struct.
- block_lshift(ctx->K1.b, ctx->K2.b);
- if (ctx->K1.b[0] & 0x80) {
memxor(ctx->K2.b, const_Rb, 16);
- }
+}
+#define MIN(x,y) ((x)<(y)?(x):(y))
+void cmac128_update(struct cmac128 *ctx, void *key,
nettle_cipher_func *encrypt,
size_t msg_len, const uint8_t *msg)
+{
- /*
* check if we expand the block
*/
- if (ctx->last_len < 16) {
size_t len = MIN(16 - ctx->last_len, msg_len);
memcpy(&ctx->last.b[ctx->last_len], msg, len);
msg += len;
msg_len -= len;
ctx->last_len += len;
- }
Other similar code uses different naming convenstions, in particular ctx->block rather than ctx->last, and ctx->index rather than ctx->last_len. Not terribly important, but it is nice to have some consistency in style.
- if (msg_len == 0) {
/* if it is still the last block, we are done */
return;
- }
- /*
* now checksum everything but the last block
*/
- memxor3(ctx->Y.b, ctx->X.b, ctx->last.b, 16);
- encrypt(key, 16, ctx->X.b, ctx->Y.b);
- while (msg_len > 16) {
memxor3(ctx->Y.b, ctx->X.b, msg, 16);
encrypt(key, 16, ctx->X.b, ctx->Y.b);
msg += 16;
msg_len -= 16;
- }
- /*
* copy the last block, it will be processed in
* cmac128_digest().
*/
- memset(ctx->last.b, 0, sizeof(ctx->last.b));
I'd prefer to postpone the memset/zero-padding until cmac128_digest.
- memcpy(ctx->last.b, msg, msg_len);
- ctx->last_len = msg_len;
+}
+void cmac128_digest(struct cmac128 *ctx, void *key,
nettle_cipher_func *encrypt,
unsigned length,
uint8_t *out)
+{
- uint8_t tmp[16];
Is both last_len == 0 and last_len == 16 possible here? If practical, it would be nice to arrange so that the update function above processes all full blocks.
- if (ctx->last_len < 16) {
ctx->last.b[ctx->last_len] = 0x80;
memxor3(tmp, ctx->last.b, ctx->K2.b, 16);
- } else {
memxor3(tmp, ctx->last.b, ctx->K1.b, 16);
- }
If that works out, this conditional would be replaced by
if (ctx->last_len > 0) { /* Pad last block, xor with K2 rather than K1 */ assert (ctx->last_len < 16); ctx->last.b[ctx->last_len] = 0x80; memset(last.b + ctx->last_len + 1, 0, 15 - ctx->last_len); memxor3(tmp, ctx->last.b, ctx->K2.b, 16) } /* Note: No else clause */
Other code reuses the block buffer for temporary values in the digest function. Maybe that would work here too, with memxor rather than memxor3. Since memxor is a byte-by-byte operation, one could also consider eliminating the memset, and instead do memxor + memcpy, but not sure if that's an improvement in perforamance or readability.
- memxor3(ctx->Y.b, tmp, ctx->X.b, 16);
- assert(length <= 16);
- if (length == 16) {
encrypt(key, length, out, ctx->Y.b);
- } else {
encrypt(key, length, tmp, ctx->Y.b);
memcpy(out, tmp, length);
- }
Maybe I'm missing something, but to me both clauses look equivalent?
diff --git a/cmac.h b/cmac.h new file mode 100644 index 00000000..5f0553b7 --- /dev/null +++ b/cmac.h @@ -0,0 +1,117 @@ +struct cmac128 {
- union nettle_block16 K1;
- union nettle_block16 K2;
- union nettle_block16 L;
- union nettle_block16 X;
- union nettle_block16 Y;
- union nettle_block16 last;
- size_t last_len;
+};
To me, it looks like L and Y don't belong in the context struct. As I understand it, K1 and K2 are the mmac key, X is the state, and last is the buffer needed to handle one block at a time.
Regards, /Niels
On Mon, Jan 15, 2018 at 9:37 PM Niels Möller nisse@lysator.liu.se wrote:
Nikos Mavrogiannopoulos nmav@redhat.com writes:
Re-sending as it seems I forgot to remove cmac-internal from makefile.
I've had a first reading, and a few comments.
Thank you for the review. I reply inline.
diff --git a/cmac.c b/cmac.c new file mode 100644 index 00000000..b4886808 --- /dev/null +++ b/cmac.c
+static const uint8_t const_Rb[] = {
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x87
+};
+static inline void block_lshift(const union nettle_block16 *in, + union nettle_block16 *out) +{
uint64_t b1 = READ_UINT64(in->b);
uint64_t b2 = READ_UINT64(in->b+8);
unsigned overflow = b2 & 0x8000000000000000;
b1 <<= 1;
b2 <<= 1;
if (overflow)
b1 |= 0x01;
I'd write the shift as
b1 = (b1 << 1) | (b2 >> 63); b2 <<= 1;
I have bad experience with shifts over 31. I could switch it to that, but my understanding is that they don't universally work especially in 32-bit systems offerring 64-bit quantities. Given that this code is being used unconditionally and is not performance critical, I'd stay with the safe version, unless you have strong opinion about it.
I think it could return the bit shifted out (since the next thing done is examining that bit). Or maybe even better, move the
if (one-bit shifted out) { memxor const_Rb }
into this function, which would then correspond to multiplication by in Z_2[x] / (Rb polynomial). Nettle conventions would also put the destination argument first. I'd suggest rewriting the function as
Nice suggestion. Done.
+void cmac128_set_key(struct cmac128 *ctx, void *key,
nettle_cipher_func *encrypt)
+{
memset(ctx, 0, sizeof(*ctx));
/* step 1 - generate subkeys k1 and k2 */
encrypt(key, 16, ctx->L.b, const_Zero);
block_lshift(ctx->L.b, ctx->K1.b);
if (ctx->L.b[0] & 0x80) {
memxor(ctx->K1.b, const_Rb, 16);
}
Is ctx->L used for anything after key setup? If not, it should be moved to a local variable, out of the context struct.
No. The reason it was used like than in samba is was so that when overwritting the state, any intermediate data of the operation were erased. That's not a bad idea, though, we don't need an additional variable as 'block' is available for temp use at this stage. I've modified it to use that.
block_lshift(ctx->K1.b, ctx->K2.b);
if (ctx->K1.b[0] & 0x80) {
memxor(ctx->K2.b, const_Rb, 16);
}
+}
+#define MIN(x,y) ((x)<(y)?(x):(y))
+void cmac128_update(struct cmac128 *ctx, void *key,
nettle_cipher_func *encrypt,
size_t msg_len, const uint8_t *msg)
+{
/*
* check if we expand the block
*/
if (ctx->last_len < 16) {
size_t len = MIN(16 - ctx->last_len, msg_len);
memcpy(&ctx->last.b[ctx->last_len], msg, len);
msg += len;
msg_len -= len;
ctx->last_len += len;
}
Other similar code uses different naming convenstions, in particular ctx->block rather than ctx->last, and ctx->index rather than ctx->last_len. Not terribly important, but it is nice to have some consistency in style.
Done.
if (msg_len == 0) {
/* if it is still the last block, we are done */
return;
}
/*
* now checksum everything but the last block
*/
memxor3(ctx->Y.b, ctx->X.b, ctx->last.b, 16);
encrypt(key, 16, ctx->X.b, ctx->Y.b);
while (msg_len > 16) {
memxor3(ctx->Y.b, ctx->X.b, msg, 16);
encrypt(key, 16, ctx->X.b, ctx->Y.b);
msg += 16;
msg_len -= 16;
}
/*
* copy the last block, it will be processed in
* cmac128_digest().
*/
memset(ctx->last.b, 0, sizeof(ctx->last.b));
I'd prefer to postpone the memset/zero-padding until cmac128_digest.
Done.
memcpy(ctx->last.b, msg, msg_len);
ctx->last_len = msg_len;
+}
+void cmac128_digest(struct cmac128 *ctx, void *key,
nettle_cipher_func *encrypt,
unsigned length,
uint8_t *out)
+{
uint8_t tmp[16];
Is both last_len == 0 and last_len == 16 possible here? If practical, it would be nice to arrange so that the update function above processes all full blocks.
if (ctx->last_len < 16) {
ctx->last.b[ctx->last_len] = 0x80;
memxor3(tmp, ctx->last.b, ctx->K2.b, 16);
} else {
memxor3(tmp, ctx->last.b, ctx->K1.b, 16);
}
If that works out, this conditional would be replaced by
if (ctx->last_len > 0) { /* Pad last block, xor with K2 rather than K1 */ assert (ctx->last_len < 16); ctx->last.b[ctx->last_len] = 0x80; memset(last.b + ctx->last_len + 1, 0, 15 - ctx->last_len); memxor3(tmp, ctx->last.b, ctx->K2.b, 16) } /* Note: No else clause */
Other code reuses the block buffer for temporary values in the digest function. Maybe that would work here too, with memxor rather than memxor3. Since memxor is a byte-by-byte operation, one could also consider eliminating the memset, and instead do memxor + memcpy, but not sure if that's an improvement in perforamance or readability.
memxor3(ctx->Y.b, tmp, ctx->X.b, 16);
assert(length <= 16);
if (length == 16) {
encrypt(key, length, out, ctx->Y.b);
} else {
encrypt(key, length, tmp, ctx->Y.b);
memcpy(out, tmp, length);
}
Maybe I'm missing something, but to me both clauses look equivalent?
They are updated to: if (length == 16) { encrypt(key, 16, out, ctx->Y.b); } else { encrypt(key, 16, tmp, ctx->Y.b); memcpy(out, tmp, length); }
The difference is that tmp is used to store the output of encrypt() in the second clause.
diff --git a/cmac.h b/cmac.h new file mode 100644 index 00000000..5f0553b7 --- /dev/null +++ b/cmac.h @@ -0,0 +1,117 @@ +struct cmac128 {
union nettle_block16 K1;
union nettle_block16 K2;
union nettle_block16 L;
union nettle_block16 X;
union nettle_block16 Y;
union nettle_block16 last;
size_t last_len;
+};
To me, it looks like L and Y don't belong in the context struct. As I understand it, K1 and K2 are the mmac key, X is the state, and last is the buffer needed to handle one block at a time.
Updated.
I've also added some documentation and enhanced the cmac test to check operation on re-use of structure, as well as on byte-to-byte encryption.
Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
On Mon, Jan 15, 2018 at 9:37 PM Niels Möller nisse@lysator.liu.se wrote:
unsigned overflow = b2 & 0x8000000000000000;
b1 <<= 1;
b2 <<= 1;
if (overflow)
b1 |= 0x01;
I'd write the shift as
b1 = (b1 << 1) | (b2 >> 63); b2 <<= 1;
I have bad experience with shifts over 31. I could switch it to that, but my understanding is that they don't universally work especially in 32-bit systems offerring 64-bit quantities. Given that this code is being used unconditionally and is not performance critical, I'd stay with the safe version, unless you have strong opinion about it.
What problems have you seen? I've not heard of any C compilers which have uint64_t but doesn't support shift properly (and this is the simplest case, with unsigned values and constant shift count). The idiom I proposed is used all over the place in gmp, but there I guess we don't use 64-bit types on 32-bit platforms.
On the other hand, I think
unsigned overflow = b2 & 0x8000000000000000;
is incorrect if int is 32 bit, won't the result be always zero?
In your updated patch you write
+ if (overflow) + b1 |= 0x01; + + if (in->b[0] & 0x80) + b2 ^= 0x87;
It would be nice to at least handle the twp top bits in the same way. If you're worried 64-bit shifts are not supported by 32-bit compilers, maybe replace "overflow" by "in->b[8] & 0x80", for consistency ?
And it would be good with proper test coverage for all four values of the two bits in questions.
They are updated to: if (length == 16) { encrypt(key, 16, out, ctx->Y.b); } else { encrypt(key, 16, tmp, ctx->Y.b); memcpy(out, tmp, length); }
The difference is that tmp is used to store the output of encrypt() in the second clause.
Sorry for the confusion. I read "length" as the length of the final partial block, but here it's the desired digest size, and the memcpy is needed for truncation.
Regards, /Niels
On Tue, 2018-01-16 at 14:25 +0100, Niels Möller wrote:
Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
On Mon, Jan 15, 2018 at 9:37 PM Niels Möller nisse@lysator.liu.se wrote:
unsigned overflow = b2 & 0x8000000000000000;
b1 <<= 1;
b2 <<= 1;
if (overflow)
b1 |= 0x01;
I'd write the shift as
b1 = (b1 << 1) | (b2 >> 63); b2 <<= 1;
I have bad experience with shifts over 31. I could switch it to that, but my understanding is that they don't universally work especially in 32-bit systems offerring 64-bit quantities. Given that this code is being used unconditionally and is not performance critical, I'd stay with the safe version, unless you have strong opinion about it.
What problems have you seen? I've not heard of any C compilers which have uint64_t but doesn't support shift properly (and this is the simplest case, with unsigned values and constant shift count). The idiom I proposed is used all over the place in gmp, but there I guess we don't use 64-bit types on 32-bit platforms.
I do not seem to find any references for similar issues. Out of memory it was a report on the failure of a longer than 32-bit shift on an mips64 system. It's been long time ago and it may have been a compiler issue.
I attach a patch to switch to the 63-bit switch on top of the previous one in case you prefer that version (seems simpler).
On the other hand, I think
unsigned overflow = b2 & 0x8000000000000000; is incorrect if int is 32 bit, won't the result be always zero?
Thank you for the catch. Hopefully the x86 run on our CI would have caught it but I never run it there. I've now sent a build with the 0001 patch at: https://gitlab.com/nmav/nettle/pipelines/16256301
regards, Nikos
On Wed, 2018-01-17 at 10:59 +0100, Nikos Mavrogiannopoulos wrote:
Thank you for the catch. Hopefully the x86 run on our CI would have caught it but I never run it there. I've now sent a build with the 0001 patch at: https://gitlab.com/nmav/nettle/pipelines/16256301
Following up on my patchset, this (hopefully final) version introduces CMAC with AES-256 as well. It also removes the CMAC128_KEY_SIZE definition as the key size only depends on the block algorithm used.
regards, Nikos
Nikos Mavrogiannopoulos nmav@redhat.com writes:
On Wed, 2018-01-17 at 10:59 +0100, Nikos Mavrogiannopoulos wrote:
Following up on my patchset, this (hopefully final) version introduces CMAC with AES-256 as well. It also removes the CMAC128_KEY_SIZE definition as the key size only depends on the block algorithm used.
Thanks. I'm about to merge this, but it will probably take a day or two more.
+void cmac128_set_key(struct cmac128 *ctx, void *key,
nettle_cipher_func *encrypt);
I was a bit confused by "key" here, but it's the context struct of the underlying cipher? It should be const void *. In the ccm and eax code, it seems we use the name "cipher" for this, e.g.,
void eax_set_key (struct eax_key *key, const void *cipher, nettle_cipher_func *f);
Regards, /Niels
Nikos Mavrogiannopoulos nmav@redhat.com writes:
+@acronym{CMAC} is a message authentication code based on CBC encryption +mode. It is suitable for systems where block ciphers are preferrable +and perform better than hash functions. @acronym{CMAC} is specified in +@cite{RFC4493}. The secret key is always 128 bits (16 octets).
Should be "block size", not "secret key", right?
Regards, /Niels
On Thu, 2018-02-15 at 07:53 +0100, Niels Möller wrote:
Nikos Mavrogiannopoulos nmav@redhat.com writes:
+@acronym{CMAC} is a message authentication code based on CBC encryption +mode. It is suitable for systems where block ciphers are preferrable +and perform better than hash functions. @acronym{CMAC} is specified in +@cite{RFC4493}. The secret key is always 128 bits (16 octets).
Should be "block size", not "secret key", right?
Right. Updated patch (and merged all), to include this fix, and the naming of variables (including changing out to dst).
regards, Nikos
On Thu, 2018-02-15 at 09:45 +0100, Nikos Mavrogiannopoulos wrote:
On Thu, 2018-02-15 at 07:53 +0100, Niels Möller wrote:
Nikos Mavrogiannopoulos nmav@redhat.com writes:
+@acronym{CMAC} is a message authentication code based on CBC encryption +mode. It is suitable for systems where block ciphers are preferrable +and perform better than hash functions. @acronym{CMAC} is specified in +@cite{RFC4493}. The secret key is always 128 bits (16 octets).
Should be "block size", not "secret key", right?
Right. Updated patch (and merged all), to include this fix, and the naming of variables (including changing out to dst).
While using that code, I realized that the CMAC128_UPDATE was misusing the 'data' field. In the attached patch I've renamed it to 'src' to avoid ambiguities.
regards, Nikos
Nikos Mavrogiannopoulos nmav@redhat.com writes:
While using that code, I realized that the CMAC128_UPDATE was misusing the 'data' field. In the attached patch I've renamed it to 'src' to avoid ambiguities.
Pushed to a branch "cmac-support", together with ChangeLog and some cleanups:
* cmac.h (struct cmac128): Rename, to cmac128_ctx. (CMAC128_CTX): Rename first member from data to ctx.
* cmac.c: Use const void * as the type for cipher arguments. (block_mulx): Un-inline. (cmac128_set_key): Make a constant function local.
* testsuite/cmac-test.c: Delete local typedefs.
Thanks! /Niels
nisse@lysator.liu.se (Niels Möller) writes:
Pushed to a branch "cmac-support"
Merged to master now, with the additional fix for deallocating memory in the test.
Regards, /Niels
nettle-bugs@lists.lysator.liu.se