Ron Frederick ronf@timeheart.net writes:
Understood, and I’ve already implemented a copy() method on the wrapper which allocates a new context structure and initializes it with the contents of the existing context. However, it seems very expensive to do that malloc & memcpy on _every_ call to digest(), since as you’ve said this is the uncommon case. Given the documented behavior of the digest() function in PEP 452 though, that’s what I would have to do, since there’s no way in the wrapper to know if the caller might want to continue feeding data after digest() is called.
One problem is that many hash functions have an end-of-data padding which, if implemented in the simple way, clobbers the block buffer. So if the digest() wasn't allowed to modify the context, that would introduce some extra copying or complexity for the common case where digest is the final operation on the data.
And most hashes have a pretty small context, so that an extra copy in the uncommon case isn't a big deal. Now, umac differs from plain hash functions in that it has a much larger context, making copying more expensive.
The nonce auto-increment is less of a problem.
I would also like to say that for a MAC which depends on a nonce (in contrast to plain hash functions and HMAC), the python "PEP 452" API allowing multiple calls to digest seems dangerous. I'd expect that the key could be attacked if you expose both UMAC(key, nonce, "foo") and UMAC(key, nonce, "foobar"), since the nonce is supposed to be unique for each message.
Maybe one reasonable way to implement the python API could be to require an explicit set_nonce, and raise an exception if digest is called without a corresponding set_nonce? I.e., if set_nonce was never called, or if there are two digest calls without an intervening set_nonce. And then do any helper methods you want for managing the nonce value in the python code?
It makes sense to me to have Nettle default to doing the reset & auto-increment of the nonce, as I agree that would be the common case for someone using Nettle directly.. However, if this could be made configurable when the context is created, it would make it possible to avoid the cost of the malloc & memcpy in the Python wrapper unless the caller actually wanted to “fork” a context and hash multiple independent streams of data which had a common prefix.
If you'd like to experiment, you could try writing a
umac32_digest_pure (const struct umac32_ctx *ctx, ...)
which doesn't modify the context, to see what it takes. Probably can't use the "padding cache" optimization, though.
I'd prefer a separate function (naming is a bit difficult, as usual) over a flag in the context.
What would probably be better (but a larger reorg), is to separate the state which depends on the key from the state which depends on message and nonce. The only MAC-like algoritm currently done like that is GCM, which also has a large key-dependent table. That allows several contexts sharing the same key-dependent tables.
It would make sense to do something similar for HMAC too. Currently, a hmac_sha256_ctx consists of three sha256_ctx, which is three SHA256 state vectors of 32 bytes each, plus three block buffers of 64 bytes each. So a total of 300 bytes or so. But it really needs only one block buffer, so if state vectors and block buffers were better separated, it could be trimmed down to about 170 bytes.
Regards, /Niels