Re: umac

28 Mar 2013


      nisse@lysator.liu.se (Niels Möller) writes:
...
To get to understand the algorithm better, I'm writing a toy
implementation. I'll let you know when I have a poly128-routine
which passes the tests.
Done now. Available at
http://www.lysator.liu.se/~nisse/misc/umac-0.1.tar.gz. The code is not
seriously optimized, and it supports only 32-bit tags. The functions in
poly128.c might be useful.
I'm a bit worried about testing; the test vectors in RFC4418 seem
inadequate to get confidence in an implementation. I include the README
file (an extended version of the notes I mailed earlier) below.
Regards,
/Niels
This is a "toy implementation" of UMAC, which I wrote to understand
the algorithm, and try to figure out the proper interfaces for
implementing it in Nettle. The current code only does the smallest tag
size, 32 bits.
Some lessons learned:
* I was initially confused about endianness in the spec. It turns out
  the data being hashed is read into 32-bit integers using
  little-endian byteorder. *Everything* else uses big-endian
  conventions.
* Testing is poor. There are a lot of corner cases in the algorithm,
  primarily in the layer two hashing. First, there are a bunch of
  conditions depending on the message length (no POLY, only POLY64,
  combination of POLY64 and POLY128), and then there's the "marker"
  words in the POLY algorithm which are very unlikely for random input
  and hence a bit tricky to get properly tested.
So to get confidence in any UMAC implementation, it would be highly
  desirable with a set of known test vectors which provide for
  complete code coverage, but I haven't found any.
* The nh function is expected to be the main work-horse (but I haven't
  profiled anything). So it's a candidate for assembly implementation.
  To avoid having to read (and deal with unaligned data and
  endianness) more than once, it would make sense with an nh function
  which gets the tag length as an argument, and does all needed
  processing of a single block. An alternative is to copy the input
  block into an aligned uint32_t array before calling nh, but I think
  it's better not to make any extra copy of the input, and just
  process it on the fly as it is read.
* Optimization of the layer two hash, in particular the poly64 and
  poly128 operations, may be important. Profiling will tell.
* Since I have been working with side-channel silence recently, it
  seems natural to try to make the POLY function silent. On the other
  hand, I'm not sure what the threats are. If the MAC is applied to a
  secret message, we may leak some information about the message, I
  guess? Is it a problem worth bothering about?
* One can expect the POLY128 operations, used for large messages, to
  be roughly four times slower than POLY64 (quadratic
  multiplication). Benchmarking is needed to see if it makes a
  noticeable difference, layer two hashing may still be a small
  fraction of the work done in layer one.
About the proper interface (primarily for Nettle).
* It's natural to provide the nonce at the time the digest (or "tag")
  is extracted, since (i) that's when it is needed, and (ii) for other
  macs, that's the only function which is called precisely once per
  message. (Current code doesn't do that).
* It's desirable to keep the underlying AES cipher separate from the
  rest of the UMAC code, and provide it only for key setup. But that
  doesn't work, since nonce handling also needs the block cipher. I
  think it would be possible to still use some abstract KDF to provide
  the key, but that seems a bit pointless if we hard-code use of AES
  anyway.
* I'm leaning towards having separate types and functions for
  UMAC-AES-32, UMAC-AES-64, UMAC-AES-96 and UMAC-AES-128, which would
  all use common primitives for layer 1, 2 and 3 hashing. The
  primitives should be independent of AES. It would then be easy to
  define UMAC with other ciphers and KDFs if ever needed (and as far
  as I understand, a *block* cipher shouldn't be strictly necessary,
  since it's essentially used in CTR mode anyway, e.g., UMAC-SALSA20
  should be possible).
Or we can have a common UMAC-AES type, with the digest size as an
  argument to key setup. For smaller tag sizes, we then allocate space
  for more subkeys than needed. This is a drawback shared with, e.g.,
  the current AES code in Nettle. There's also some additional non-key
  state for larger tag sizes (running variables for the polynomials),
  but that's a small issue.
* Speaking of storage micro-optimizations, it might make sense to have
  poly64 and poly128 share storage, or to combine them into an "layer
  two hash" abstraction.
* To improve performance for small messages, it would make some sense
  to compute the subkeys lazily, and not compute keys which aren't
  going to be used.
* In some ways it would be nice and proper to separate the subkeys,
  which are constant while messages are being processed, from the
  per-message state (applies also to the HMAC code). One potential use
  would be to have a single key and several threads using it to
  process messages. Also not sure if that is worth bothering about. If
  we go that way, it more or less rules out lazy computation of
  subkeys.
LocalWords:  UMAC endianness endian byteorder nh uint AES KDF KDFs
LocalWords:  subkeys HMAC
-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: umac