I have now read the UMAC spec (RFC 4418) a bit more carefully. I haven't yet read Niko's code (or any other code, for that matter). Some thoughts:
o I don't like the way endian conversion is done in the spec. I'd prefer to think about the various functions as operating on arrays of 32-bit words, and implementation should use integer types of the right size to get correct alignment etc.
o The "NH" function looks like a candidate for for assembly implementation. I don't know if there's anything else in the algorithm which really is performance critical? (And here we get a contradiction to point (1), it may be best for performance to have the NH function get the unaligned byte array as input, do be able to use assembly tricks when reading it into integers. Anyway, we should really avoid byte arrays in the internal interfaces between L1/L2/L3).
o *Maybe* optimization of the L2 and L3 hashes will be important. Profiling is needed, I guess, and they should be optimized *after* L1/NH.
o Since I have been work with side-channel silence recently, it seems natural to try to make the POLY function silent, On the other hand, I'm not sure what the threats are. If the MAC is applied to a secret message, we may leak some information about the message, I guess?
o I think we ought to handle large messages correctly, which means we need the POLY function also over the 128-bit prime. Performance is not terribly important, at least not initially.
o I'm not sure exactly how the building blocks fit together, but we should strive for pipelining. When we have the first message block M_1, apply L1 to that block, then apply L2 and L3 to the output as soon as possible. And for the larger tag lengths, also try to make that looping inside the loop processing the sequence of message blocks, so we can discard M_1 before starting to work on the next block M_2.
Regards, /Niels
On Tue, Mar 26, 2013 at 3:47 PM, Niels Möller nisse@lysator.liu.se wrote:
I have now read the UMAC spec (RFC 4418) a bit more carefully. I haven't yet read Niko's code (or any other code, for that matter). Some
To put the discussion on some context for the others, I have sent a patch that adds support for the UMAC algorithm. The patch did not made it to the list due to size and was forwarded directly to Niels.
btw. It would be nice if the list would notify the poster when messages are being held. In that case I only figured out it was held some days later by checking the archives. Also please use Nikos for my name. The Niko sounds very Italian (and I am not one :).
o Since I have been work with side-channel silence recently, it seems natural to try to make the POLY function silent, On the other hand, I'm not sure what the threats are. If the MAC is applied to a secret message, we may leak some information about the message, I guess?
Do you think the current function could provide information on the plaintext? From a quick look it is not obvious to me, but I haven't checked thoroughly.
o I think we ought to handle large messages correctly, which means we need the POLY function also over the 128-bit prime. Performance is not terribly important, at least not initially.
Note that the current limit on the code is 16MB messages per tag, not per key, so it is oversufficient for all practical uses of a MAC (which is not the same as a hash). It would be nice not to have the limit, but it would be nicer if anyone actually cared about it :) I haven't checked but if adding support for larger messages per tag would reduce performance, but if this is the case I think it would be pretty counter-productive.
regards, Nikos
Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
btw. It would be nice if the list would notify the poster when messages are being held.
It's intended to do that, but I'm not very good at mailman configuration, so I guess I have messed that up somehow.
Also please use Nikos for my name.
Noted, sorry about that.
Do you think the current function could provide information on the plaintext? From a quick look it is not obvious to me, but I haven't checked thoroughly.
The poly_hash function clearly has data-dependent timing. If it is useful for the attacker, I don't know.
Note that the current limit on the code is 16MB messages per tag, not per key, so it is oversufficient for all practical uses of a MAC (which is not the same as a hash).
I consider handling of large files to be an important application of any MAC. When encrypting a large file (typical cases: session key derived from a passphrase, or random session key encrypted with RSA), the session key should always include a MAC key for authenticating the data. And streaming operation is important, and then you don't even know in advance if the file is going to be small or large.
I haven't checked but if adding support for larger messages per tag would reduce performance, but if this is the case I think it would be pretty counter-productive.
It should make no difference for short messages. The "layer-2" hashing for large messages will be some four times slower than for short messages. Not sure what the overall slowdown would be; the layer-2 work may still be a small fraction of the time spent in the layer-1 hashing.
Regards, /Niels
On Wed, Mar 27, 2013 at 1:29 PM, Niels Möller nisse@lysator.liu.se wrote:
Do you think the current function could provide information on the plaintext? From a quick look it is not obvious to me, but I haven't checked thoroughly.
The poly_hash function clearly has data-dependent timing. If it is useful for the attacker, I don't know.
I see what you mean; i was checking only the poly64 part. Indeed there is an additional evaluation of the poly64 when m >= some size. In the description it is like that: if (m >= maxwordrange) then y = (k * y + marker) mod p y = (k * y + (m - offset)) mod p else y = (k * y + m) mod p end if
which doesn't look to provide much, but the time spent it seems it could be used to distinguish text consisting of 0xfffffffff, to text that is other than that. I don't know how practical is that because the overall instructions needed for an evaluation of poly64() are very few, but still looks like an issue. It could be fixed by adding an additional evaluation of poly64 (i.e., do an additional multiplication on the else case), but it looks it is going to affect negatively the performance of all cases except 0xffffffffff. I don't know if the expression y = (k * y + marker) mod p y = (k * y + (m - offset)) mod p
can be optimized to a single evaluation by using a separate poly64 function for it (that would be fine if the difference is just one or two multiplications more than the original poly64).
Note that the current limit on the code is 16MB messages per tag, not per key, so it is oversufficient for all practical uses of a MAC (which is not the same as a hash).
I consider handling of large files to be an important application of any MAC. When encrypting a large file (typical cases: session key derived from a passphrase, or random session key encrypted with RSA), the session key should always include a MAC key for authenticating the data.
Encryption of big data is seldom happening on a single run. If there is a bit flip on the hard disk all the data are gone (decryption and mac will fail for all) and so is any possibility of recovery. I'd expect data to be encrypted in chunks (i think gpg and the other hard disk encryption tools work like that). In any case if you consider that limitation, a stopper, I could check it further.
regards, Nikos
Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
In any case if you consider that limitation, a stopper, I could check it further.
To get to understand the algorithm better, I'm writing a toy implementation. I'll let you know when I have a poly128-routine which passes the tests.
Regards, /Niels
On 03/27/2013 06:56 PM, Niels Möller wrote:
In any case if you consider that limitation, a stopper, I could check it further.
To get to understand the algorithm better, I'm writing a toy implementation. I'll let you know when I have a poly128-routine which passes the tests.
Note also that I realized that in the implementation I submitted the nonce is restricted to 64-bits, while up to 128-bits may also be used.
regards, Nikos
nisse@lysator.liu.se (Niels Möller) writes:
To get to understand the algorithm better, I'm writing a toy implementation. I'll let you know when I have a poly128-routine which passes the tests.
Done now. Available at http://www.lysator.liu.se/~nisse/misc/umac-0.1.tar.gz. The code is not seriously optimized, and it supports only 32-bit tags. The functions in poly128.c might be useful.
I'm a bit worried about testing; the test vectors in RFC4418 seem inadequate to get confidence in an implementation. I include the README file (an extended version of the notes I mailed earlier) below.
Regards, /Niels
This is a "toy implementation" of UMAC, which I wrote to understand the algorithm, and try to figure out the proper interfaces for implementing it in Nettle. The current code only does the smallest tag size, 32 bits.
Some lessons learned:
* I was initially confused about endianness in the spec. It turns out the data being hashed is read into 32-bit integers using little-endian byteorder. *Everything* else uses big-endian conventions.
* Testing is poor. There are a lot of corner cases in the algorithm, primarily in the layer two hashing. First, there are a bunch of conditions depending on the message length (no POLY, only POLY64, combination of POLY64 and POLY128), and then there's the "marker" words in the POLY algorithm which are very unlikely for random input and hence a bit tricky to get properly tested.
So to get confidence in any UMAC implementation, it would be highly desirable with a set of known test vectors which provide for complete code coverage, but I haven't found any.
* The nh function is expected to be the main work-horse (but I haven't profiled anything). So it's a candidate for assembly implementation. To avoid having to read (and deal with unaligned data and endianness) more than once, it would make sense with an nh function which gets the tag length as an argument, and does all needed processing of a single block. An alternative is to copy the input block into an aligned uint32_t array before calling nh, but I think it's better not to make any extra copy of the input, and just process it on the fly as it is read.
* Optimization of the layer two hash, in particular the poly64 and poly128 operations, may be important. Profiling will tell.
* Since I have been working with side-channel silence recently, it seems natural to try to make the POLY function silent. On the other hand, I'm not sure what the threats are. If the MAC is applied to a secret message, we may leak some information about the message, I guess? Is it a problem worth bothering about?
* One can expect the POLY128 operations, used for large messages, to be roughly four times slower than POLY64 (quadratic multiplication). Benchmarking is needed to see if it makes a noticeable difference, layer two hashing may still be a small fraction of the work done in layer one.
About the proper interface (primarily for Nettle).
* It's natural to provide the nonce at the time the digest (or "tag") is extracted, since (i) that's when it is needed, and (ii) for other macs, that's the only function which is called precisely once per message. (Current code doesn't do that).
* It's desirable to keep the underlying AES cipher separate from the rest of the UMAC code, and provide it only for key setup. But that doesn't work, since nonce handling also needs the block cipher. I think it would be possible to still use some abstract KDF to provide the key, but that seems a bit pointless if we hard-code use of AES anyway.
* I'm leaning towards having separate types and functions for UMAC-AES-32, UMAC-AES-64, UMAC-AES-96 and UMAC-AES-128, which would all use common primitives for layer 1, 2 and 3 hashing. The primitives should be independent of AES. It would then be easy to define UMAC with other ciphers and KDFs if ever needed (and as far as I understand, a *block* cipher shouldn't be strictly necessary, since it's essentially used in CTR mode anyway, e.g., UMAC-SALSA20 should be possible).
Or we can have a common UMAC-AES type, with the digest size as an argument to key setup. For smaller tag sizes, we then allocate space for more subkeys than needed. This is a drawback shared with, e.g., the current AES code in Nettle. There's also some additional non-key state for larger tag sizes (running variables for the polynomials), but that's a small issue.
* Speaking of storage micro-optimizations, it might make sense to have poly64 and poly128 share storage, or to combine them into an "layer two hash" abstraction.
* To improve performance for small messages, it would make some sense to compute the subkeys lazily, and not compute keys which aren't going to be used.
* In some ways it would be nice and proper to separate the subkeys, which are constant while messages are being processed, from the per-message state (applies also to the HMAC code). One potential use would be to have a single key and several threads using it to process messages. Also not sure if that is worth bothering about. If we go that way, it more or less rules out lazy computation of subkeys.
LocalWords: UMAC endianness endian byteorder nh uint AES KDF KDFs LocalWords: subkeys HMAC
On Thu, Mar 28, 2013 at 2:06 PM, Niels Möller nisse@lysator.liu.se wrote:
I'm a bit worried about testing; the test vectors in RFC4418 seem inadequate to get confidence in an implementation. I include the README file (an extended version of the notes I mailed earlier) below.
Unfortunately yes. Moreover vectors of the UMAC-128 tags are not even mentioned. It may be better to generate any desired vectors using Krovetz' implementation.
- Since I have been working with side-channel silence recently, it seems natural to try to make the POLY function silent. On the other hand, I'm not sure what the threats are. If the MAC is applied to a secret message, we may leak some information about the message, I guess? Is it a problem worth bothering about?
Well, it seems that hiding information about the data being protected isn't one of the main properties of a MAC algorithm. MAC algorithms struggle to protect against key recovery, insertion attacks and modification attacks. Your remark about the possibility of extracting information about the plaintext is indeed interesting and one more argument for applying the MAC after encryption.
- It's natural to provide the nonce at the time the digest (or "tag") is extracted, since (i) that's when it is needed, and (ii) for other macs, that's the only function which is called precisely once per message. (Current code doesn't do that).
I think a generic interface for macs should be followed. The fact that this algorithm requires the nonce during digest extraction seem pretty UMAC-specific. Having a set_nonce function to be called after init seem reasonable.
- It's desirable to keep the underlying AES cipher separate from the rest of the UMAC code, and provide it only for key setup.
UMAC is only specified with AES, and RFC4418 when it mentions UMAC it means UMAC-AES. I'd keep the UMAC-AES/UMAC a single algorithm and not separate the AES part. The fact that other ciphers could be used isn't very interesting since these options are neither standardized nor studied. Once such an other option is available sometime in the future it can be another algorithm e.g. umac-des.
- I'm leaning towards having separate types and functions for UMAC-AES-32, UMAC-AES-64, UMAC-AES-96 and UMAC-AES-128, which would all use common primitives for layer 1, 2 and 3 hashing. The primitives should be independent of AES. It would then be easy to define UMAC with other ciphers and KDFs if ever needed (and as far as I understand, a *block* cipher shouldn't be strictly necessary, since it's essentially used in CTR mode anyway, e.g., UMAC-SALSA20 should be possible).
Making things so general would actually complicate the interface without any benefit for the time being. Yes it would be future-proof, but there is no guarantee that another variant of UMAC will be defined in the future - it may be VMAC - nor something like that is planned.
- In some ways it would be nice and proper to separate the subkeys, which are constant while messages are being processed, from the per-message state (applies also to the HMAC code). One potential use would be to have a single key and several threads using it to process messages.
Does this property exist in any other nettle algorithms? If not I'd say don't bother. My impression with the contexts in nettle is that a single context is used in a single thread.
regards, Nikos
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
On Thu, Mar 28, 2013 at 2:06 PM, Niels Möller nisse@lysator.liu.se wrote:
I'm a bit worried about testing; the test vectors in RFC4418 seem inadequate to get confidence in an implementation. I include the README file (an extended version of the notes I mailed earlier) below.
Unfortunately yes. Moreover vectors of the UMAC-128 tags are not even mentioned. It may be better to generate any desired vectors using Krovetz' implementation.
I mailed him and asked, and he also suggested using rthe python (or ruby) implementation. Will be some work though.
Well, it seems that hiding information about the data being protected isn't one of the main properties of a MAC algorithm.
Right, maybe we don't need to care about that.
- It's natural to provide the nonce at the time the digest (or "tag") is extracted, since (i) that's when it is needed, and (ii) for other macs, that's the only function which is called precisely once per message. (Current code doesn't do that).
I think a generic interface for macs should be followed. The fact that this algorithm requires the nonce during digest extraction seem pretty UMAC-specific. Having a set_nonce function to be called after init seem reasonable.
It's something like that in the gcm code (but called set_iv). So we should aim for some consistency.
Maybe one could also have a default autoincrementing nonce?
- It's desirable to keep the underlying AES cipher separate from the rest of the UMAC code, and provide it only for key setup.
UMAC is only specified with AES, and RFC4418 when it mentions UMAC it means UMAC-AES.
[...]
Making things so general would actually complicate the interface without any benefit for the time being.
I agree that it's not worth to make the user-level interface, nor the implementation, more complex to achieve separation, but some level of separation may actually make implementation clearer.
- In some ways it would be nice and proper to separate the subkeys, which are constant while messages are being processed, from the per-message state (applies also to the HMAC code). One potential use would be to have a single key and several threads using it to process messages.
Does this property exist in any other nettle algorithms?
It's (trivially) true for the block ciphers, and the gcm code separates key state (struct gcm_key) from message state (struct gcm_ctx). The hmac code does not, and I'd like to change that *if* I can find some reasonable way to do it.
Regards, /Niels
On 03/28/2013 06:24 PM, Niels Möller wrote:
Maybe one could also have a default autoincrementing nonce?
If you do that please don't make it the default. There are several cases in DTLS where the nonce isn't simply incrementing (e.g. when receiving packets out-of-order).
Does this property exist in any other nettle algorithms?
It's (trivially) true for the block ciphers, and the gcm code separates key state (struct gcm_key) from message state (struct gcm_ctx). The hmac code does not, and I'd like to change that *if* I can find some reasonable way to do it.
It is pretty trivial to overcome though, i.e., just use different contexts on each thread (you don't even need to re-initialize - just copy the context), so I wouldn't spend resources on it unless there is some obvious advantage which I don't see.
regards, Nikos
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
On 03/28/2013 06:24 PM, Niels Möller wrote:
Maybe one could also have a default autoincrementing nonce?
If you do that please don't make it the default. There are several cases in DTLS where the nonce isn't simply incrementing (e.g. when receiving packets out-of-order).
I was thinking of _init setting it to zero, and have _digest do post increment. So then you could chose between
_init
per message: _set_nonce _update (0 or more times) _digest
to get a nonce of your choice for each message, or
_init
per message: _update (0 or more times) _digest
to get an incrementing nonce, starting from zero, or
_init _set_nonce
per message: _update (0 or more times) _digest
to get an incrementing nonce, with a starting value of your choice.
Do you think this makes sense? There's a slight disadvantage that in the case that you call set_nonce for each message, the automatic inititializing and updating of the nonce is some useless work. I'm not sure what the typical usecases are.
Regards, /Niels
On 04/02/2013 08:35 AM, Niels Möller wrote:
Maybe one could also have a default autoincrementing nonce?
If you do that please don't make it the default. There are several cases in DTLS where the nonce isn't simply incrementing (e.g. when receiving packets out-of-order).
I was thinking of _init setting it to zero, and have _digest do post increment. So then you could chose between
Looks ok but I don't like that cycles are wasted in the case one doesn't use it. They are not much but I wouldn't expect that from a low-level library.
I think having an other interface to increment any kind of nonce/iv may be more interesting.
regards, Nikos
On Tue, Apr 2, 2013 at 10:16 PM, Nikos Mavrogiannopoulos < n.mavrogiannopoulos@gmail.com> wrote:
If you do that please don't make it the default. There are several cases in DTLS where the nonce isn't simply incrementing (e.g. when receiving packets out-of-order).
I was thinking of _init setting it to zero, and have _digest do post increment. So then you could chose between
Looks ok but I don't like that cycles are wasted in the case one doesn't use it. They are not much but I wouldn't expect that from a low-level library.
I may be wrong here. I'd expect the library to do similarly to what it does in CBC mode encryption. If it updates the IV there, then it would be natural to expect that it will update the nonce in that interface as well.
regards, Nikos
Here's a tentative Nettle interface for umac. Comments appreciated.
I'll be traveling over the weekend, more or less offline, so I may not respond until Tuesday or so.
Regards, /Niels
/* umac.h * * UMAC message authentication code (RFC-4418). */
/* nettle, low-level cryptographics library * * Copyright (C) 2013 Niels Möller * * The nettle library is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser General Public License as published by * the Free Software Foundation; either version 2.1 of the License, or (at your * option) any later version. * * The nettle library is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public * License for more details. * * You should have received a copy of the GNU Lesser General Public License * along with the nettle library; see the file COPYING.LIB. If not, write to * the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, * MA 02111-1301, USA. */
#ifndef NETTLE_UMAC_H_INCLUDED #define NETTLE_UMAC_H_INCLUDED
#ifdef __cplusplus extern "C" { #endif
#define UMAC_BLOCK_SIZE 1024 #define UMAC_KEY_SIZE 16
/* Subkeys and state for UMAC with tag size 32*n bits. */ #define _UMAC_STATE(n) \ uint32_t l1_key[UMAC_BLOCK_SIZE/4 + 4*((n)-1)]; \ /* Keys in 32-bit pieces, low first */ \ uint32_t p64_key[2*(n)]; \ uint32_t p128_key[4*(n)]; \ uint64_t l3_key1[8*(n)]; \ uint32_t l3_key2[(n)]; \ /* AES cipher for encrypting the nonce */ \ struct aes_ctx pdf_key; \ /* Buffer l1 output for one block */ \ uint64_t l1_out[(n)]; \ /* For both poly64-hashing and poly128 hashing */ \ uint64_t poly_state[2*(n)];
struct umac_buffer { unsigned blocks; unsigned pos; uint8_t buffer[UMAC_BLOCK_SIZE]; }
struct umac32_ctx { _UMAC_STATE(1); /* Input to the pdf_key, zero-padded and low bits cleared. */ uint8_t nonce[AES_BLOCK_SIZE]; int nonce_low; /* Low bits, plus some flag for the pad cache. */ /* Previous padding block */ uint8_t pad_cache[AES_BLOCK_SIZE];
struct umac_buffer buffer; };
struct umac64_ctx { _UMAC_STATE(2); /* Input to the pdf_key, zero-padded and low bits cleared. */ uint8_t nonce[AES_BLOCK_SIZE]; int nonce_low; /* Low bits, plus some flag for the pad cache. */ /* Previous padding block */ uint8_t pad_cache[AES_BLOCK_SIZE];
struct umac_buffer buffer; };
struct umac96_ctx { _UMAC_STATE(3); /* Input to the pdf_key, zero-padded if needed. */ uint8_t nonce[AES_BLOCK_SIZE];
struct umac_buffer buffer; };
struct umac128_ctx { _UMAC_STATE(4); /* Input to the pdf_key, zero-padded if needed. */ uint8_t nonce[AES_BLOCK_SIZE];
struct umac_buffer buffer; };
/* The _set_key function initialize the nonce to zero. */ void umac32_set_key (struct umac32_ctx *ctx, const uint8_t *key); void umac64_set_key (struct umac64_ctx *ctx, const uint8_t *key); void umac96_set_key (struct umac96_ctx *ctx, const uint8_t *key); void umac128_set_key (struct umac128_ctx *ctx, const uint8_t *key);
/* Optional, if not used, messages get incrementing nonces starting from zero. */ void umac32_set_nonce (struct umac32_ctx *ctx, unsigned nonce_length, const uint8_t *nonce); void umac64_set_nonce (struct umac64_ctx *ctx, unsigned nonce_length, const uint8_t *nonce); void umac96_set_nonce (struct umac96_ctx *ctx, unsigned nonce_length, const uint8_t *nonce); void umac128_set_nonce (struct umac128_ctx *ctx, unsigned nonce_length, const uint8_t *nonce);
void umac32_update (struct umac32_ctx *ctx, unsigned length, const uint8_t *data); void umac64_update (struct umac64_ctx *ctx, unsigned length, const uint8_t *data); void umac96_update (struct umac96_ctx *ctx, unsigned length, const uint8_t *data); void umac128_update (struct umac128_ctx *ctx, unsigned length, const uint8_t *data);
/* The _digest functions increment the nonce */ void umac32_digest (struct umac32_ctx *ctx, unsigned length, uint8_t *digest); void umac64_digest (struct umac64_ctx *ctx, unsigned length, uint8_t *digest); void umac96_digest (struct umac96_ctx *ctx, unsigned length, uint8_t *digest); void umac128_digest (struct umac128_ctx *ctx, unsigned length, uint8_t *digest);
/* Internal functions */ #define _UMAC_POLY64_BLOCKS 16384
uint64_t _umac_nh (const uint32_t *key, unsigned length, const uint8_t *msg);
/* Equivalent to
for (i = 0; i < n; i++) out[i] = _umac_nh (key + 4*i, length, msg);
but processing input only once. */ void _umac_nh_n (uint64_t *out, unsigned n, const uint32_t *key, unsigned length, const uint8_t *msg);
/* Returns y*k + m (mod p), including "marker" processing. Return value is *not* in canonical representation, and must be normalized before the output is used. */ uint64_t _umac_poly64 (uint32_t kh, uint32_t kl, uint64_t y, uint64_t m);
void _umac_poly128 (const uint32_t *k, uint32_t *y, uint64_t mh, uint64_t ml);
uint32_t _umac_l3 (const uint64_t *key_1, uint32_t key_2, const uint64_t *m);
#ifdef __cplusplus } #endif
#endif /* NETTLE_UMAC_H_INCLUDED */
On Fri, Apr 5, 2013 at 10:13 AM, Niels Möller nisse@lysator.liu.se wrote:
Here's a tentative Nettle interface for umac. Comments appreciated.
Looks fine to me.
regards, Nikos
nisse@lysator.liu.se (Niels Möller) writes:
Here's a tentative Nettle interface for umac. Comments appreciated.
Implemented now. I added some more testvectors, using the python code as reference (I should check that in some day soon too). Still no coverage of all unlikely cases, I'm afraid.
I also added some benchmark code, but I'm not sure if it measures short (< 16 Mbyte) or larger messages, since the benchmarking function keeps calling the update function with a stop condition based on elapsed time. It's clearly a bit faster than the other hash functions, even with 128-bit output.
Regards, /Niels
On 04/11/2013 09:38 PM, Niels Möller wrote:
Here's a tentative Nettle interface for umac. Comments appreciated.
Implemented now.
Nice.
I also added some benchmark code, but I'm not sure if it measures short (< 16 Mbyte) or larger messages, since the benchmarking function keeps calling the update function with a stop condition based on elapsed time. It's clearly a bit faster than the other hash functions, even with 128-bit output.
It is a bit unfair to compare umac with plain hashes. In the last patch there was an addition to the benchmarks to have HMAC performance as well.
regards, Nikos
Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
It is a bit unfair to compare umac with plain hashes.
For large messages I think it is fair; the O(n) work is the same for hash and hmac-hash; hmac only adds O(1) work of hashing some short constant-size strings. And the benchmark program only tries to measure the O(n) term.
In the last patch there was an addition to the benchmarks to have HMAC performance as well.
Sorry I missed that. Was there really any measureable difference between hash and hmac-hash? The "update" processing is the same (except perhaps an additional function call).
BTW, I'm currently debugging some ARM neon code for umac_nh. Looks like umac32 will be about 10 times faster than sha1. And neon is *so* much nicer to work with than the sse instructions on x86.
Regards, /Niels
On Fri, Apr 12, 2013 at 10:43 AM, Niels Möller nisse@lysator.liu.se wrote:
Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
It is a bit unfair to compare umac with plain hashes.
For large messages I think it is fair; the O(n) work is the same for hash and hmac-hash; hmac only adds O(1) work of hashing some short constant-size strings. And the benchmark program only tries to measure the O(n) term.
That's correct if you are thinking about long messages. In practice, however, MACs are used with short messages. In TLS a MAC is used with a message that has a maximum of a 15kb, and in DTLS most messages are around the ethernet packet size (1500 bytes). At these sizes any overhead O(1) may be critical to performance.
In the last patch there was an addition to the benchmarks to have HMAC performance as well.
Sorry I missed that. Was there really any measureable difference between hash and hmac-hash? The "update" processing is the same (except perhaps an additional function call).
I don't remember. It was a different test. I only have a copy of the mail I sent you and the numbers were the following: umac-aes-4 full 737.86 umac-aes-8 full 801.11 umac-aes-12 full 771.41 umac-aes-16 full 724.91 hmac-md5 full 481.41 hmac-sha1 full 437.43 hmac-sha256 full 171.49 hmac-sha3_256 full 200.44
On the same system now I get: md5 update 506.93 sha1 update 446.76 sha256 update 197.33 sha3_256 update 231.99
and for UMAC: umac32 update 3596.71 umac64 update 1967.24 umac96 update 1564.06 umac128 update 1304.87
regards, Nikos
On 04/11/2013 09:38 PM, Niels Möller wrote:
nisse@lysator.liu.se (Niels Möller) writes:
Here's a tentative Nettle interface for umac. Comments appreciated.
I modified gnutls to use that code. I noticed that the context size is quite excessive (may be an issue when is used as stack variable), but as I understand there is not much that can be done about it.
Something I noticed is that for the nonce increment you could also use the INCREMENT macro used in ctr.
Also because I work using an abstraction layer, and the umac_set_key() is different from hmac_xhash_set_key() which all have a length parameter, requires me to use a wrapper over it. It may be nicer (for me at least) if umac_set_key() accepted the length as well.
regards, Nikos
Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
I modified gnutls to use that code. I noticed that the context size is quite excessive (may be an issue when is used as stack variable), but as I understand there is not much that can be done about it.
The block buffer and the l1 subkeys need about 1 Kbyte each.
Something I noticed is that for the nonce increment you could also use the INCREMENT macro used in ctr.
Right. But I first need to modify the INCREMENT macro to support length == 1, otherwise it wont work for a single-byte nonce.
Also because I work using an abstraction layer, and the umac_set_key() is different from hmac_xhash_set_key() which all have a length parameter, requires me to use a wrapper over it. It may be nicer (for me at least) if umac_set_key() accepted the length as well.
I don't think I want to add a key_length parameter, if all I can do with it is an assert (key_length == 16). Is there any other reasonable use? I guess one could pass a longer key (192 or 256 bits) to aes_set_encrypt_key, but that's beyond the umac spec.
In nettle in general, algorithms with a fix key size never gets a key size argument; that's for the next abstraction layer to unify, if desired.
Regards, /Niels
nettle-bugs@lists.lysator.liu.se