On Dec 27, 2016, at 1:11 AM, Niels Möller nisse@lysator.liu.se wrote:
Ron Frederick ronf@timeheart.net writes:
Understood, and I’ve already implemented a copy() method on the wrapper which allocates a new context structure and initializes it with the contents of the existing context. However, it seems very expensive to do that malloc & memcpy on _every_ call to digest(), since as you’ve said this is the uncommon case. Given the documented behavior of the digest() function in PEP 452 though, that’s what I would have to do, since there’s no way in the wrapper to know if the caller might want to continue feeding data after digest() is called.
One problem is that many hash functions have an end-of-data padding which, if implemented in the simple way, clobbers the block buffer. So if the digest() wasn't allowed to modify the context, that would introduce some extra copying or complexity for the common case where digest is the final operation on the data.
And most hashes have a pretty small context, so that an extra copy in the uncommon case isn't a big deal. Now, umac differs from plain hash functions in that it has a much larger context, making copying more expensive.
I looked a bit closer at the underlying C code in the Python library for this, and was surprised to find that indeed all of the standard hash modules (md5, sha256, etc.) as well as the hmac code are always making a copy of their state into a temporary object before finalizing the digest, to implement the behavior described in the PEP which allows continued updates, with no option to avoid that for one-time-use hash objects. Since this is done in C code, they are able to at least have the copy allocated on the stack, though, as opposed to paying for a heap allocation & free, and so you’re right that this isn’t too expensive for hashes with a small context.
Unfortunately, if I were to do the copy in my Python UMAC wrapper, I’d have to pay for both a copy and a heap allocation, as well as any other overhead involved in creating a new ctypes string buffer. To avoid this, I like your suggestion below of possibly having two different digest functions available in C, and letting the caller decide which one to call depending on their use case.
I would also like to say that for a MAC which depends on a nonce (in contrast to plain hash functions and HMAC), the python "PEP 452" API allowing multiple calls to digest seems dangerous. I'd expect that the key could be attacked if you expose both UMAC(key, nonce, "foo") and UMAC(key, nonce, "foobar"), since the nonce is supposed to be unique for each message.
I’m not a cryptographer so I can’t really speak to this point, but how would this be different than supporting a copy() function for something like HMAC, where you generate a digest for both a message and its prefix? I thought the main attack against UMAC when you reused both a key and a nonce for two different messages was that the key stream for both messages was identical and it was only later mixed with data from the messages. That would not be the case when dealing with a prefix of a message, but perhaps there’s a more subtle attack I’m missing there related to padding, for instance.
That said, the behavior defined in the PEP of allowing digests to be generated on a prefix of a message as well as the copy() function goes all the way back to PEP 247 for keyed hashes released in 2001, and was re-affirmed in PEP 452 in 2013. If there was an attack here, I would have expected someone to raise this concern. Admittedly, PEP 247/452 doesn’t explicitly contemplate hash functions with nonces, but it would cover cases where multiple digests were generated using the same key and no other randomization to make the two digests independent.
Maybe one reasonable way to implement the python API could be to require an explicit set_nonce, and raise an exception if digest is called without a corresponding set_nonce? I.e., if set_nonce was never called, or if there are two digest calls without an intervening set_nonce. And then do any helper methods you want for managing the nonce value in the python code?
I like your suggestion below of two different digest functions, one of which did not modify the context (allowing for additional data to be appended) and the other of which did a finalization and probably always did a nonce auto-increment to help guard against nonce re-use. With this approach, there’d be no way to ever hash two different messages with the same key & nonce unless you did so explicitly with set_nonce().
In this case, I don’t think any restrictions would need to be placed on set_nonce() being called. When set_key() is called, it would default to an all-zeroes nonce as it does today, and when the current digest() function was called, it would always reset the hash state and auto-increment the nonce so any new message would be hashed with a different nonce than the previous one. Calling digest_pure() would not increment the nonce, but it would also not reset the hash state. So, any new data provided would be appended to the current message and not be treated as a new message.
If you'd like to experiment, you could try writing a
umac32_digest_pure (const struct umac32_ctx *ctx, ...)
which doesn't modify the context, to see what it takes. Probably can't use the "padding cache" optimization, though.
I'd prefer a separate function (naming is a bit difficult, as usual) over a flag in the context.
I agree that a separate function is probably better than a flag. What do you think of the name partial_digest(), or perhaps running_digest()? I’ll take a look at the current code and see how difficult this would be.
What would probably be better (but a larger reorg), is to separate the state which depends on the key from the state which depends on message and nonce. The only MAC-like algoritm currently done like that is GCM, which also has a large key-dependent table. That allows several contexts sharing the same key-dependent tables.
So, would you basically have two different context objects in this case, one associated with just the key and a separate one associated with the nonce and the message data provided so far? It seems like you’d still need to make a copy of the second object before calculating a digest if you wanted to support partial digests and didn’t have a “pure” digest function that could do that without modifying the state, but at least the amount of data you had to copy might be smaller.
It would make sense to do something similar for HMAC too. Currently, a hmac_sha256_ctx consists of three sha256_ctx, which is three SHA256 state vectors of 32 bytes each, plus three block buffers of 64 bytes each. So a total of 300 bytes or so. But it really needs only one block buffer, so if state vectors and block buffers were better separated, it could be trimmed down to about 170 bytes.