Hello,
I recently discovered the Nettle crypto library and noticed that it has an implementation of UMAC in it, which I haven’t been able to find in most of the other popular crypto libraries out there. I was hoping to call into the UMAC functions from Python, and I see Python mentioned on the Nettle home page at http://www.lysator.liu.se/~nisse/nettle/ http://www.lysator.liu.se/~nisse/nettle/, but there’s no link for it down in the “Language bindings” section, and in fact many of the other links in that section seem to point at dead pages.
I tried searching for Python bindings elsewhere, but haven’t had any luck so far. Does anyone know of any?
Assuming there aren’t Python bindings yet, I would be tempted to try and craft my own (at least for UMAC for now). However, I’d like to be able to do it using the Python ‘ctypes’ library, and one difficulty I can see with doing that is knowing the amount of memory I should allocate for some of the structures used by this code. Specifically, for UMAC, I’d need to know the size things like “struct umac32_ctx”, “struct umac64_ctx”, “struct umac96_ctx”, and “struct umac128_ctx”. These sizes are trivial to get from C code using sizeof(), but I don’t believe there’s any good way to get them from Python using ctypes unless there’s an API call that can be made to a C function in the library which returns this information.
As an example of this, libsodium provides functions like crypto_stream_chacha20_keybytes() and crypto_stream_chacha20_noncebytes() which can be called to return the size of a Chacha20 key and nonce, respectively. In the C header file, these look like:
#define crypto_stream_chacha20_KEYBYTES 32U SODIUM_EXPORT size_t crypto_stream_chacha20_keybytes(void);
#define crypto_stream_chacha20_NONCEBYTES 8U SODIUM_EXPORT size_t crypto_stream_chacha20_noncebytes(void);
The corresponding functions in the code just return the constants defined in the header file shown above.
For this UMAC example in Nettle, I could imagine similar functions which returned the value computed by sizeof() for these context structures, and also things like the UMAC key size, digest sizes, block size, and allowed nonce sizes. Has a use-case like this been considered? Is there some other way to call into Nettle from Python without writing additional glue code in C?
Ron Frederick ronf@timeheart.net writes:
I recently discovered the Nettle crypto library and noticed that it has an implementation of UMAC in it, which I haven’t been able to find in most of the other popular crypto libraries out there. I was hoping to call into the UMAC functions from Python, and I see Python mentioned on the Nettle home page at http://www.lysator.liu.se/~nisse/nettle/ http://www.lysator.liu.se/~nisse/nettle/, but there’s no link for it down in the “Language bindings” section, and in fact many of the other links in that section seem to point at dead pages.
I tried searching for Python bindings elsewhere, but haven’t had any luck so far. Does anyone know of any?
Sorry for the late reply. Python bindings for nettle would make a lot of sense, but I'm afraid I'm not aware of any.
Assuming there aren’t Python bindings yet, I would be tempted to try and craft my own (at least for UMAC for now). However, I’d like to be able to do it using the Python ‘ctypes’ library, and one difficulty I can see with doing that is knowing the amount of memory I should allocate for some of the structures used by this code. Specifically, for UMAC, I’d need to know the size things like “struct umac32_ctx”, “struct umac64_ctx”, “struct umac96_ctx”, and “struct umac128_ctx”. These sizes are trivial to get from C code using sizeof(), but I don’t believe there’s any good way to get them from Python using ctypes unless there’s an API call that can be made to a C function in the library which returns this information.
These sizes are arhitecture dependent. And may change between nettle release (with an ABI break, new soname, etc). It's technically possible to add functions you suggest, but it seems both ugly and unusual to me. I hope you can solve the problem in another way.
As an example of this, libsodium provides functions like crypto_stream_chacha20_keybytes() and crypto_stream_chacha20_noncebytes() which can be called to return the size of a Chacha20 key and nonce, respectively. In the C header file, these look like:
These are different; these constants are part of the algorithm specification. The defines in the header files are for convenience, but not the ultimate definition of the values.
For this UMAC example in Nettle, I could imagine similar functions which returned the value computed by sizeof() for these context structures, and also things like the UMAC key size, digest sizes, block size, and allowed nonce sizes. Has a use-case like this been considered? Is there some other way to call into Nettle from Python without writing additional glue code in C?
For hashes and ciphers without odd quirks, you can use the corresponding structs nettle_cipher and nettle_hash, defined in nettle-meta.h. Currently, not available for MAC algorithms.
Regards, /Niels
On Dec 21, 2016, at 1:01 PM, Niels Möller nisse@lysator.liu.se wrote:
Assuming there aren’t Python bindings yet, I would be tempted to try and craft my own (at least for UMAC for now). However, I’d like to be able to do it using the Python ‘ctypes’ library, and one difficulty I can see with doing that is knowing the amount of memory I should allocate for some of the structures used by this code. Specifically, for UMAC, I’d need to know the size things like “struct umac32_ctx”, “struct umac64_ctx”, “struct umac96_ctx”, and “struct umac128_ctx”. These sizes are trivial to get from C code using sizeof(), but I don’t believe there’s any good way to get them from Python using ctypes unless there’s an API call that can be made to a C function in the library which returns this information.
These sizes are arhitecture dependent. And may change between nettle release (with an ABI break, new soname, etc). It's technically possible to add functions you suggest, but it seems both ugly and unusual to me. I hope you can solve the problem in another way.
[Ron] Thanks very much for getting back to me, Niels!
Rather than adding a function to return the structure size here, what do you think about adding functions such as umac32_new() which would do the allocation of the structure and return a pointer to it? This would address the issue I’m seeing with ctypes in Python and remain completely compatible with the existing API. I understand the distaste for having libraries like this allocate memory, but the only options I know of to provide cross-language support like this is to either provide an allocator function that knows how much space to allocate or to provide a function which returns the amount of space needed to pass to an external allocator.
To make sure there weren’t any other issues in doing this, I did a proof of concept where I simply did a fixed-size allocation of 4 KB for the contexts (since the largest context is currently only about 2800 bytes on the system I’m running on) and I was able to make everything after that work fine. Here’s a snippet of some Python 3 test code (without a pretty wrapper yet):
import binascii import ctypes
UMAC_CTX_SIZE = 4096
UMAC32_DIGEST_SIZE = 4
_nettle_lib = ctypes.cdll.LoadLibrary('/opt/local/lib/libnettle.dylib')
_umac32_set_key = _nettle_lib.nettle_umac32_set_key _umac32_set_nonce = _nettle_lib.nettle_umac32_set_nonce _umac32_update = _nettle_lib.nettle_umac32_update _umac32_digest = _nettle_lib.nettle_umac32_digest
ctx = ctypes.create_string_buffer(UMAC_CTX_SIZE)
key = b'abcdefghijklmnop' nonce = b'bcdefghi' digest = ctypes.create_string_buffer(UMAC32_DIGEST_SIZE)
for msg in (b'', 3*b'a', 1024*b'a', 32768*b'a', 1024*1024*b'a', 32768*1024*b'a', b'abc', 500*b'abc'): _umac32_set_key(ctx, key) _umac32_set_nonce(ctx, ctypes.c_size_t(len(nonce)), nonce) _umac32_update(ctx, ctypes.c_size_t(len(msg)), msg) _umac32_digest(ctx, ctypes.c_size_t(len(digest)), digest) print(binascii.b2a_hex(digest.raw).decode())
I really don’t want to rely on a hard-coded size like this, though, in case future versions of the context structure grow much larger for some reason. Also, even if they don’t, rounding up like this is a waste of memory. Can you think of any other options here?
As an example of this, libsodium provides functions like crypto_stream_chacha20_keybytes() and crypto_stream_chacha20_noncebytes() which can be called to return the size of a Chacha20 key and nonce, respectively. In the C header file, these look like:
These are different; these constants are part of the algorithm specification. The defines in the header files are for convenience, but not the ultimate definition of the values.
[Ron] Fair enough. For these, I could potentially just replicate the definitions in the Python code, without worry that they would change. It was just nice in the libsodium case that I didn’t have to.
Ron Frederick ronf@timeheart.net writes:
Rather than adding a function to return the structure size here, what do you think about adding functions such as umac32_new() which would do the allocation of the structure and return a pointer to it?
Actually, I think I'd prefer a way to get the struct size at run time; otherwise I'd have to consider interfaces to let an application override the allocation function used by the *_new functions.
Are you sure the ctypes package you use doesn't provide any general way to extract struct sizes from a C interface? I know it's common practice to design libraries to have only opaque types, with function calls for allocation, use, and deallocation. But I wouldn't expect nettle to be the only library which tries to be more low-level and expose some internals.
For example, if one ever wants to let the python code read or write fields of a struct used in a C interface, then, I imagine, the python glue magic needs to either extract the layout from the header file, or generate a litte C code, including the header file, to get proper sizes and offsets or accesor functions.
I think that, in general, it makes little sense to use C code in a shared library without also using the corresponding header file in some way or the other.
In principle, the compiler could insert information about sizes and struct layouts (for more general use than just debug info) into the object files, so that the python glue code could extract it from the shared library itself. But as far as I'm aware, common compilers and linkers don't do that.
ctx = ctypes.create_string_buffer(UMAC_CTX_SIZE)
Note that nettle's context structs have stricter alignment requirements than a string. Maybe you need to use a different allocation method, to guarantedd you at least as much alignment as the system's malloc?
I really don’t want to rely on a hard-coded size like this,
I totally agree that's a bad workaround.
/Niels
On Dec 22, 2016, at 12:16 PM, Niels Möller nisse@lysator.liu.se wrote:
Ron Frederick ronf@timeheart.net writes:
Rather than adding a function to return the structure size here, what do you think about adding functions such as umac32_new() which would do the allocation of the structure and return a pointer to it?
Actually, I think I'd prefer a way to get the struct size at run time; otherwise I'd have to consider interfaces to let an application override the allocation function used by the *_new functions.
Are you sure the ctypes package you use doesn't provide any general way to extract struct sizes from a C interface? I know it's common practice to design libraries to have only opaque types, with function calls for allocation, use, and deallocation. But I wouldn't expect nettle to be the only library which tries to be more low-level and expose some internals.
[Ron] Keep in mind that the binding we’re talking about here is all happening dynamically at run time of the Python program. At that point in time, source code to the C module wouldn’t even be available on the system (not even header files) in many cases, and there’d be no C compiler or preprocessor available to parse them even if they were. The only thing available is the information in the dynamic library itself, readable by the linker.
There is a “sizeof” function in ctypes for structures, but the exact types and ordering of the members of the structure would have to be provided in the Python code, which in this case would be translating the C structure definitions (and CPP macros in this case) into the ctypes equivalent, meaning the structure is no longer opaque.
For example, if one ever wants to let the python code read or write fields of a struct used in a C interface, then, I imagine, the python glue magic needs to either extract the layout from the header file, or generate a litte C code, including the header file, to get proper sizes and offsets or accesor functions.
[Ron] There are other ways to interface to C code in Python that can auto-generate shims to do something like this, but they would involve actually generating and compiling additional C code at build time and shipping an additional C shared library with the Python code, which means multiple versions of the package are needed to cover each of the supported architectures. That’s what I’m trying to avoid here, as up until now my package is 100% pure Python and the exact same code runs on all architectures and OSes (MacOS/Windows/UNIX). The only calls out to C code are either provided by other Python modules or involve dynamic linking against third-party C libraries using ctypes operating at the ABI level.
I think that, in general, it makes little sense to use C code in a shared library without also using the corresponding header file in some way or the other.
In principle, the compiler could insert information about sizes and struct layouts (for more general use than just debug info) into the object files, so that the python glue code could extract it from the shared library itself. But as far as I'm aware, common compilers and linkers don't do that.
[Ron] Yeah - I think that sort of information may be available when libraries are compiled with debug information, but it’s not something available to the linker as far as I know and I’ve never seen anything to allow languages like Python to use that data to access structures like this.
ctx = ctypes.create_string_buffer(UMAC_CTX_SIZE)
Note that nettle's context structs have stricter alignment requirements than a string. Maybe you need to use a different allocation method, to guarantedd you at least as much alignment as the system's malloc?
[Ron] This is a good point. Based on some simple tests, the returned memory always seems to be at least 16-byte aligned similar to malloc(), but I can’t actually find documentation that explicitly promises this. I’ll have to do more research on this.
I really don’t want to rely on a hard-coded size like this,
I totally agree that's a bad workaround.
[Ron] Unfortunately, my options at the moment seem to be to either do this or keep looking for alternate implementations of UMAC I can attempt to link against if I want to continue to keep my code 100% Python, and the Nettle version of UMAC is by far the cleanest thing I’ve found so far.
Ron Frederick ronf@timeheart.net writes:
[Ron] There are other ways to interface to C code in Python that can auto-generate shims to do something like this, but they would involve actually generating and compiling additional C code at build time and shipping an additional C shared library with the Python code, which means multiple versions of the package are needed to cover each of the supported architectures.
That's they way I would expect language bindings to work. And I don't think any other way can work if one aims to have python glue for everything in Nettle.
That said, for umac and other mac algorithms, I think it would make sense to provide structs similar to nettle_hash which includes all needed sizes and function pointers.
Would that solve your problem? You could try to see if you can make 100% python wrapper for the supported hash functions in the nettle_hashes list (I think Daniel Kahn Gillmor did something similar in his perl bindings). I guess you'd still need to access struct members, though. If useful, it's also possible to add accessor functions like
size_t nettle_hash_ctx_size(const struct nettle_hash *hash) { return hash->ctx_size; }
nettle_hash_init_func *nettle_hash_init(const struct nettle_hash *hash) { return hash->init; }
etc. (Other variants are possible, if one aims for a function-call-only api).
And then there's a known ABI problem with exporting that list, discussed in another mail to the list, which needs to be solved in one way or the other.
[Ron] This is a good point. Based on some simple tests, the returned memory always seems to be at least 16-byte aligned similar to malloc(), but I can’t actually find documentation that explicitly promises this. I’ll have to do more research on this.
If you don't find any better way, maybe you can use ctype to call libc malloc directly?
Regards, /Niels
On Dec 23, 2016, at 8:07 AM, Niels Möller nisse@lysator.liu.se wrote:
Ron Frederick ronf@timeheart.net writes:
[Ron] There are other ways to interface to C code in Python that can auto-generate shims to do something like this, but they would involve actually generating and compiling additional C code at build time and shipping an additional C shared library with the Python code, which means multiple versions of the package are needed to cover each of the supported architectures.
That's they way I would expect language bindings to work. And I don't think any other way can work if one aims to have python glue for everything in Nettle.
That said, for umac and other mac algorithms, I think it would make sense to provide structs similar to nettle_hash which includes all needed sizes and function pointers.
Would that solve your problem? You could try to see if you can make 100% python wrapper for the supported hash functions in the nettle_hashes list (I think Daniel Kahn Gillmor did something similar in his perl bindings). I guess you'd still need to access struct members, though. If useful, it's also possible to add accessor functions like
size_t nettle_hash_ctx_size(const struct nettle_hash *hash) { return hash->ctx_size; }
nettle_hash_init_func *nettle_hash_init(const struct nettle_hash *hash) { return hash->init; }
etc. (Other variants are possible, if one aims for a function-call-only api).
Thanks - a function-call based API that didn’t rely on the Python code needing to access any Nettle-specific structures sounds good, and would be preferred to something that involves declaring a struct to access function pointers and other static data.
I’ve confirmed that for UMAC the only such function I would need to build a Python wrapper is something that returns the context size. In fact, I have a first cut of such a module written and working with a “try” clause that shows how I could call such a function to request the size if it existed, with a fallback which uses a hard-coded size if that’s not present.
Regarding the init function, that shouldn’t be necessary if Nettle guarantees that a call to set_key() resets the context structure and performs all necessarily initialization. I can see where init() would be needed for the key-less hash functions, but it may not be needed here.
That actually leads to one other wrinkle I’ve run into. According to the Nettle docs:
Function: void umac32_digest (struct umac32_ctx *ctx, size_t length, uint8_t *digest) <>Function: void umac64_digest (struct umac64_ctx *ctx, size_t length, uint8_t *digest) <>Function: void umac96_digest (struct umac96_ctx *ctx, size_t length, uint8_t *digest) <>Function: void umac128_digest (struct umac128_ctx *ctx, size_t length, uint8_t *digest) Extracts the MAC of the message, writing it to digest. length is usually equal to the specified output size, but if you provide a smaller value, only the first length octets of the MAC are written. These functions reset the context for processing of a new message with the same key. The nonce is incremented as described above, the new value is used unless you call the _set_nonce function explicitly for each message.
While this wouldn’t really be a problem for my use case, the Python cryptographic hash standard API (defined in PEP 452) states the following about the digest() method:
digest() Return the hash value of this hashing object as a bytes containing 8-bit data. The object is not altered in any way by this function; you can continue updating the object after calling this function.
So, if I wanted to provide a Python module which adhered to this API, the automatic reset of the context and increment of the nonce would be a problem. Mind you, I think this is a very useful feature in Nettle and wouldn’t want to see it go away. However, do you think it would be possible to make this configurable? There could be something like a umac32_auto_increment_nonce() function that takes a context and a true/false value as an argument to determine whether calling digest() does this or not. If it didn’t, you could continue to append to a message even after requesting a hash of the data provided so far, without the need to make a copy of the context structure first.
And then there's a known ABI problem with exporting that list, discussed in another mail to the list, which needs to be solved in one way or the other.
Yeah, I saw the earlier e-mail on that. My vote would be for option 2 in your list where a function is defined that returns a pointer to the list, and I think that would probably be a good approach for accessing any static data in the library.
That said, I like the suggestion in your other e-mail about providing both an ABI and API level interface, leaving it up to callers to decide if they’re willing to deal with the upgrade/downgrade issues of replacing the libnettle library without recompiling the calling code against the latest header file. More on that shortly in a follow-up to that other e-mail.
[Ron] This is a good point. Based on some simple tests, the returned memory always seems to be at least 16-byte aligned similar to malloc(), but I can’t actually find documentation that explicitly promises this. I’ll have to do more research on this.
If you don't find any better way, maybe you can use ctype to call libc malloc directly?
Yes - this definitely seems like it would work if the Python allocator didn’t already provide such a guarantee.
Ron Frederick ronf@timeheart.net writes:
Regarding the init function, that shouldn’t be necessary if Nettle guarantees that a call to set_key() resets the context structure and performs all necessarily initialization. I can see where init() would be needed for the key-less hash functions, but it may not be needed here.
You're right, umac needs no init call.
While this wouldn’t really be a problem for my use case, the Python cryptographic hash standard API (defined in PEP 452) states the following about the digest() method:
digest() Return the hash value of this hashing object as a bytes containing 8-bit data. The object is not altered in any way by this function; you can continue updating the object after calling this function.
So, if I wanted to provide a Python module which adhered to this API, the automatic reset of the context and increment of the nonce would be a problem.
The way you'd do this with Nettle is to make a copy (plain memcpy or struct assignment) of the context struct, and extract the digest from the copy.
The nettle design is based on the assumption that it's an uncommon use case to hash (or mac) both a string and a prefix thereof. So it's possible, but not optimized for.
Regards, /Niels
On Dec 23, 2016, at 11:00 PM, Niels Möller nisse@lysator.liu.se wrote:
Ron Frederick ronf@timeheart.net writes:
While this wouldn’t really be a problem for my use case, the Python cryptographic hash standard API (defined in PEP 452) states the following about the digest() method:
digest()
Return the hash value of this hashing object as a bytes containing 8-bit data. The object is not altered in any way by this function; you can continue updating the object after calling this function.
So, if I wanted to provide a Python module which adhered to this API, the automatic reset of the context and increment of the nonce would be a problem.
The way you'd do this with Nettle is to make a copy (plain memcpy or struct assignment) of the context struct, and extract the digest from the copy.
The nettle design is based on the assumption that it's an uncommon use case to hash (or mac) both a string and a prefix thereof. So it's possible, but not optimized for.
Understood, and I’ve already implemented a copy() method on the wrapper which allocates a new context structure and initializes it with the contents of the existing context. However, it seems very expensive to do that malloc & memcpy on _every_ call to digest(), since as you’ve said this is the uncommon case. Given the documented behavior of the digest() function in PEP 452 though, that’s what I would have to do, since there’s no way in the wrapper to know if the caller might want to continue feeding data after digest() is called.
It makes sense to me to have Nettle default to doing the reset & auto-increment of the nonce, as I agree that would be the common case for someone using Nettle directly.. However, if this could be made configurable when the context is created, it would make it possible to avoid the cost of the malloc & memcpy in the Python wrapper unless the caller actually wanted to “fork” a context and hash multiple independent streams of data which had a common prefix.
What I’d suggest is to split out the nonce increment into its own externally callable function, and add a flag on the context about whether to call that function automatically or not. The flag would default to true to preserve existing behavior, but a function could be provided to disable this for callers that wanted to be able to do partial hashing. They could then call the increment manually if they wanted to reuse a context for multiple messages, avoiding the malloc & memcpy even in both cases. The copy would only be needed in the case I mention above when hashing multiple streams that start with a common prefix.
On Dec 24, 2016, at 4:26 AM, Ron Frederick ronf@timeheart.net wrote:
What I’d suggest is to split out the nonce increment into its own externally callable function, and add a flag on the context about whether to call that function automatically or not. The flag would default to true to preserve existing behavior, but a function could be provided to disable this for callers that wanted to be able to do partial hashing. They could then call the increment manually if they wanted to reuse a context for multiple messages, avoiding the malloc & memcpy even in both cases. The copy would only be needed in the case I mention above when hashing multiple streams that start with a common prefix.
One other thought on this: Since the sender and receiver of a message need to both know the nonce, I think it would be useful for Nettle to provide a get_nonce() function if it is going to auto-increment the nonce. While this wouldn’t be needed in a simple point-to-point case where both sides are maintaining a context which stays in sync, there are other use cases where a sender might want to take advantage of the auto-increment behavior but send the nonce explicitly with their messages, so a receiver who sees only a subset of the messages can verify them. If the sender had a way to get the nonce back out of the context before they called digest(), they wouldn’t need to replicate the increment functionality in their own code.
This use case is also another argument for being able to make the auto-increment optional. If a receiver is receiving an explicit nonce with each message, there’s no reason for them to pay the cost of doing the increment function every time they call digest() to verify a message, as they’re just going to reset the nonce to something else when the next message arrives. This would apply on senders as well when using a randomly generated nonce for each message, or when they create a new context for every message they send.
Ron Frederick ronf@timeheart.net writes:
One other thought on this: Since the sender and receiver of a message need to both know the nonce, I think it would be useful for Nettle to provide a get_nonce() function if it is going to auto-increment the nonce.
One can access it directly from the nonce field of the context struct, but it makes some sense to provide a more abstract method. If we want to do this consistently, the details are not entirely obvious, though:
1. Should it return a pointer to the nonce, assuming that it is present in raw form in the context, so that it can be returned cheaply, or should it make a copy into an area provided by the caller?
2. Should a get_nonce method be added to the nettle_aead struct?
If a receiver is receiving an explicit nonce with each message, there’s no reason for them to pay the cost of doing the increment function every time they call digest() to verify a message, as they’re just going to reset the nonce to something else when the next message arrives.
It's true that the auto-increment is unnecessary in that case, but it is such a small cost that I don't think it's a good tradeoff to optimize for. (It's an invocation of the INCREMENT macro in macros.h). I think it's going to be negligible compared to all other processing done by digest(). Making it optional (via a flag in the context, or an extra function) will increase the code size for a performance gain (in some use cases) which is hardly measurable.
On Dec 27, 2016, at 1:29 AM, Niels Möller nisse@lysator.liu.se wrote:
Ron Frederick ronf@timeheart.net writes:
One other thought on this: Since the sender and receiver of a message need to both know the nonce, I think it would be useful for Nettle to provide a get_nonce() function if it is going to auto-increment the nonce.
One can access it directly from the nonce field of the context struct, but it makes some sense to provide a more abstract method. If we want to do this consistently, the details are not entirely obvious, though:
- Should it return a pointer to the nonce, assuming that it is present
in raw form in the context, so that it can be returned cheaply, or should it make a copy into an area provided by the caller?
I would have this always copy into a buffer provided by the caller similar to how the digest() method works, to avoid concerns about memory lifetime. Callers who had access to the members of the struct could always use that instead to avoid the copy cost, but they’d now be required to make sure the version of the library they linked against matched the version of the header file they compiled against.
- Should a get_nonce method be added to the nettle_aead struct?
Since that has a set_nonce() function, this could make sense, but it’s really only needed in the case where something like an auto-increment is occurring. If the library is never modifying the nonce provided by the caller, there’s probably no need for a function to return the currently set nonce. Do any of the AEAD functions modify the nonce after it is set?
If a receiver is receiving an explicit nonce with each message, there’s no reason for them to pay the cost of doing the increment function every time they call digest() to verify a message, as they’re just going to reset the nonce to something else when the next message arrives.
It's true that the auto-increment is unnecessary in that case, but it is such a small cost that I don't think it's a good tradeoff to optimize for. (It's an invocation of the INCREMENT macro in macros.h). I think it's going to be negligible compared to all other processing done by digest(). Making it optional (via a flag in the context, or an extra function) will increase the code size for a performance gain (in some use cases) which is hardly measurable.
Fair enough. I consider this more of a minor side benefit that we’d get in the process of supporting the digest_pure() capability than an independent reason to do this. While the benefit of this would be small, the ability to avoid the full copy of the context and especially to avoid an additional memory allocation associated with that is a much more compelling reason.
Ron Frederick ronf@timeheart.net writes:
Do any of the AEAD functions modify the nonce after it is set?
poly1305_aes_digest also increments the nonce. And it's in the same category as umac: A keyed hash with a per-message nonce. It seems the AEAD functions don't, so it's not entirely consistent.
Regards, /Niels
On Dec 27, 2016, at 9:57 AM, Niels Möller nisse@lysator.liu.se wrote:
Ron Frederick ronf@timeheart.net writes:
Do any of the AEAD functions modify the nonce after it is set?
poly1305_aes_digest also increments the nonce. And it's in the same category as umac: A keyed hash with a per-message nonce. It seems the AEAD functions don't, so it's not entirely consistent.
It looks like poly1305_aes_digest isn’t actually in the set of nettle_aead constructions. So, it probably doesn’t make sense to add a get_nonce() just yet to nettle_aead. If you added a meta structure for keyed hashes, though, it might make sense to have both a set_nonce() and get_nonce(), and just have those be NULL for keyed hashes where there was no nonce.
Ron Frederick ronf@timeheart.net writes:
Understood, and I’ve already implemented a copy() method on the wrapper which allocates a new context structure and initializes it with the contents of the existing context. However, it seems very expensive to do that malloc & memcpy on _every_ call to digest(), since as you’ve said this is the uncommon case. Given the documented behavior of the digest() function in PEP 452 though, that’s what I would have to do, since there’s no way in the wrapper to know if the caller might want to continue feeding data after digest() is called.
One problem is that many hash functions have an end-of-data padding which, if implemented in the simple way, clobbers the block buffer. So if the digest() wasn't allowed to modify the context, that would introduce some extra copying or complexity for the common case where digest is the final operation on the data.
And most hashes have a pretty small context, so that an extra copy in the uncommon case isn't a big deal. Now, umac differs from plain hash functions in that it has a much larger context, making copying more expensive.
The nonce auto-increment is less of a problem.
I would also like to say that for a MAC which depends on a nonce (in contrast to plain hash functions and HMAC), the python "PEP 452" API allowing multiple calls to digest seems dangerous. I'd expect that the key could be attacked if you expose both UMAC(key, nonce, "foo") and UMAC(key, nonce, "foobar"), since the nonce is supposed to be unique for each message.
Maybe one reasonable way to implement the python API could be to require an explicit set_nonce, and raise an exception if digest is called without a corresponding set_nonce? I.e., if set_nonce was never called, or if there are two digest calls without an intervening set_nonce. And then do any helper methods you want for managing the nonce value in the python code?
It makes sense to me to have Nettle default to doing the reset & auto-increment of the nonce, as I agree that would be the common case for someone using Nettle directly.. However, if this could be made configurable when the context is created, it would make it possible to avoid the cost of the malloc & memcpy in the Python wrapper unless the caller actually wanted to “fork” a context and hash multiple independent streams of data which had a common prefix.
If you'd like to experiment, you could try writing a
umac32_digest_pure (const struct umac32_ctx *ctx, ...)
which doesn't modify the context, to see what it takes. Probably can't use the "padding cache" optimization, though.
I'd prefer a separate function (naming is a bit difficult, as usual) over a flag in the context.
What would probably be better (but a larger reorg), is to separate the state which depends on the key from the state which depends on message and nonce. The only MAC-like algoritm currently done like that is GCM, which also has a large key-dependent table. That allows several contexts sharing the same key-dependent tables.
It would make sense to do something similar for HMAC too. Currently, a hmac_sha256_ctx consists of three sha256_ctx, which is three SHA256 state vectors of 32 bytes each, plus three block buffers of 64 bytes each. So a total of 300 bytes or so. But it really needs only one block buffer, so if state vectors and block buffers were better separated, it could be trimmed down to about 170 bytes.
Regards, /Niels
On Dec 27, 2016, at 1:11 AM, Niels Möller nisse@lysator.liu.se wrote:
Ron Frederick ronf@timeheart.net writes:
Understood, and I’ve already implemented a copy() method on the wrapper which allocates a new context structure and initializes it with the contents of the existing context. However, it seems very expensive to do that malloc & memcpy on _every_ call to digest(), since as you’ve said this is the uncommon case. Given the documented behavior of the digest() function in PEP 452 though, that’s what I would have to do, since there’s no way in the wrapper to know if the caller might want to continue feeding data after digest() is called.
One problem is that many hash functions have an end-of-data padding which, if implemented in the simple way, clobbers the block buffer. So if the digest() wasn't allowed to modify the context, that would introduce some extra copying or complexity for the common case where digest is the final operation on the data.
And most hashes have a pretty small context, so that an extra copy in the uncommon case isn't a big deal. Now, umac differs from plain hash functions in that it has a much larger context, making copying more expensive.
I looked a bit closer at the underlying C code in the Python library for this, and was surprised to find that indeed all of the standard hash modules (md5, sha256, etc.) as well as the hmac code are always making a copy of their state into a temporary object before finalizing the digest, to implement the behavior described in the PEP which allows continued updates, with no option to avoid that for one-time-use hash objects. Since this is done in C code, they are able to at least have the copy allocated on the stack, though, as opposed to paying for a heap allocation & free, and so you’re right that this isn’t too expensive for hashes with a small context.
Unfortunately, if I were to do the copy in my Python UMAC wrapper, I’d have to pay for both a copy and a heap allocation, as well as any other overhead involved in creating a new ctypes string buffer. To avoid this, I like your suggestion below of possibly having two different digest functions available in C, and letting the caller decide which one to call depending on their use case.
I would also like to say that for a MAC which depends on a nonce (in contrast to plain hash functions and HMAC), the python "PEP 452" API allowing multiple calls to digest seems dangerous. I'd expect that the key could be attacked if you expose both UMAC(key, nonce, "foo") and UMAC(key, nonce, "foobar"), since the nonce is supposed to be unique for each message.
I’m not a cryptographer so I can’t really speak to this point, but how would this be different than supporting a copy() function for something like HMAC, where you generate a digest for both a message and its prefix? I thought the main attack against UMAC when you reused both a key and a nonce for two different messages was that the key stream for both messages was identical and it was only later mixed with data from the messages. That would not be the case when dealing with a prefix of a message, but perhaps there’s a more subtle attack I’m missing there related to padding, for instance.
That said, the behavior defined in the PEP of allowing digests to be generated on a prefix of a message as well as the copy() function goes all the way back to PEP 247 for keyed hashes released in 2001, and was re-affirmed in PEP 452 in 2013. If there was an attack here, I would have expected someone to raise this concern. Admittedly, PEP 247/452 doesn’t explicitly contemplate hash functions with nonces, but it would cover cases where multiple digests were generated using the same key and no other randomization to make the two digests independent.
Maybe one reasonable way to implement the python API could be to require an explicit set_nonce, and raise an exception if digest is called without a corresponding set_nonce? I.e., if set_nonce was never called, or if there are two digest calls without an intervening set_nonce. And then do any helper methods you want for managing the nonce value in the python code?
I like your suggestion below of two different digest functions, one of which did not modify the context (allowing for additional data to be appended) and the other of which did a finalization and probably always did a nonce auto-increment to help guard against nonce re-use. With this approach, there’d be no way to ever hash two different messages with the same key & nonce unless you did so explicitly with set_nonce().
In this case, I don’t think any restrictions would need to be placed on set_nonce() being called. When set_key() is called, it would default to an all-zeroes nonce as it does today, and when the current digest() function was called, it would always reset the hash state and auto-increment the nonce so any new message would be hashed with a different nonce than the previous one. Calling digest_pure() would not increment the nonce, but it would also not reset the hash state. So, any new data provided would be appended to the current message and not be treated as a new message.
If you'd like to experiment, you could try writing a
umac32_digest_pure (const struct umac32_ctx *ctx, ...)
which doesn't modify the context, to see what it takes. Probably can't use the "padding cache" optimization, though.
I'd prefer a separate function (naming is a bit difficult, as usual) over a flag in the context.
I agree that a separate function is probably better than a flag. What do you think of the name partial_digest(), or perhaps running_digest()? I’ll take a look at the current code and see how difficult this would be.
What would probably be better (but a larger reorg), is to separate the state which depends on the key from the state which depends on message and nonce. The only MAC-like algoritm currently done like that is GCM, which also has a large key-dependent table. That allows several contexts sharing the same key-dependent tables.
So, would you basically have two different context objects in this case, one associated with just the key and a separate one associated with the nonce and the message data provided so far? It seems like you’d still need to make a copy of the second object before calculating a digest if you wanted to support partial digests and didn’t have a “pure” digest function that could do that without modifying the state, but at least the amount of data you had to copy might be smaller.
It would make sense to do something similar for HMAC too. Currently, a hmac_sha256_ctx consists of three sha256_ctx, which is three SHA256 state vectors of 32 bytes each, plus three block buffers of 64 bytes each. So a total of 300 bytes or so. But it really needs only one block buffer, so if state vectors and block buffers were better separated, it could be trimmed down to about 170 bytes.
Ron Frederick ronf@timeheart.net writes:
I’m not a cryptographer
And I'm an amateur when it comes to cryptanalysis.
but how would this be different than supporting a copy() function for something like HMAC, where you generate a digest for both a message and its prefix?
Difference is that HMAC doesn't have a nonce. When the attacker gets the HMAC of two messages (related or not) using the same key, the attacker is supposed to only be able to tell if the messages were identical or not.
I thought the main attack against UMAC when you reused both a key and a nonce for two different messages was that the key stream for both messages was identical and it was only later mixed with data from the messages.
I'm afraid I can't give a concrete attack, but my gut feeling is that the security analysis depends on nonces being unique, it's generally a bad idea to depart from that.
I don't recall all details of UMAC; it's some years since I wrote it. But the way I understand it, it works by computing a relatively weak low-complexity function of the expanded key and message. IIRC, L1 is a plain linear convolution, and L2 is some kind of polynomial evaluation. And then to hide the low-complexity structure from the attacker, a value derived from the key and nonce is XORed to the tag at the end. Looking at the code, that final "whitening" is the only use of the nonce I see.
So basically,
umac(key, nonce, msg) = F(key, msg) ^ AES(pdf_key, nonce)
where F(key, msg) is a low-complexity function. So if you compute
t0 = umac(key, nonce, "foo") t1 = umac(key, nonce, "foobar")
then
t0 XOR t1 = F(key, "foo") ^ F(key,"foobar")
which looks a bit too well structured, and defeats the whitening step. I can't tell if it leads to practical attacks, but I wouldn't be suprised of a few of these would let the attacker derive the parts of the expanded key corresponding to the suffix, and then forge additional messages with identical tag.
Admittedly, PEP 247/452 doesn’t explicitly contemplate hash functions with nonces, but it would cover cases where multiple digests were generated using the same key and no other randomization to make the two digests independent.
As I said, multiple messages with the same key is pretty normal use of HMAC, and if one doesn't want the attacker to be able to identify identical messages from the MACs, one can prepend a sequence number to the message. Supposedly unique per-message nonces make the situation quite different.
I agree that a separate function is probably better than a flag. What do you think of the name partial_digest(), or perhaps running_digest()?
running_digest sounds quite clear to me. Or maybe snapshot_digest?
So, would you basically have two different context objects in this case, one associated with just the key and a separate one associated with the nonce and the message data provided so far? It seems like you’d still need to make a copy of the second object before calculating a digest if you wanted to support partial digests and didn’t have a “pure” digest function that could do that without modifying the state, but at least the amount of data you had to copy might be smaller.
That's the idea. For umac, the second part is quite large too, so it would make some sense with a function that can produce the digest without clobbering (or copying) it.
Regards, /Niels
On Dec 27, 2016, at 10:35 AM, Niels Möller nisse@lysator.liu.se wrote:
I don't recall all details of UMAC; it's some years since I wrote it. But the way I understand it, it works by computing a relatively weak low-complexity function of the expanded key and message. IIRC, L1 is a plain linear convolution, and L2 is some kind of polynomial evaluation. And then to hide the low-complexity structure from the attacker, a value derived from the key and nonce is XORed to the tag at the end. Looking at the code, that final "whitening" is the only use of the nonce I see.
So basically,
umac(key, nonce, msg) = F(key, msg) ^ AES(pdf_key, nonce)
where F(key, msg) is a low-complexity function. So if you compute
t0 = umac(key, nonce, "foo") t1 = umac(key, nonce, "foobar")
then
t0 XOR t1 = F(key, "foo") ^ F(key,"foobar")
which looks a bit too well structured, and defeats the whitening step. I can't tell if it leads to practical attacks, but I wouldn't be suprised of a few of these would let the attacker derive the parts of the expanded key corresponding to the suffix, and then forge additional messages with identical tag.
Ok, I see. Given that the nonce is only folded in with XOR at the last stage of generating the digest, I can see how it’d be a problem to ever generate two digest values with the same PDF key and nonce, independent of what was going on with the other function which operated on the message. If F() is good enough, knowing the XOR of F() on two different messages (even if one is a prefix of another) still shouldn’t tell you much, but the AES pad (and any additional strength it provides) has essentially been removed entirely in this case.
This definitely raises questions about whether it would ever make sense to support either snapshot_digest() or copy() for UMAC. For my specific use case in AsyncSSH, I can get by without either. I was only trying to support these to be compatible with PEP 452, but perhaps that’s not a good idea in this case.
On Dec 27, 2016, at 1:11 AM, Niels Möller nisse@lysator.liu.se wrote:
If you'd like to experiment, you could try writing a
umac32_digest_pure (const struct umac32_ctx *ctx, ...)
which doesn't modify the context, to see what it takes. Probably can't use the "padding cache" optimization, though.
I looked over the code today, and it turns out that it’s a fairly clean change. We can even keep the “padding cache” optimization. A key thing to realize is that it ok to modify parts of the context, as long as they don’t affect future calculations. In other words, it’s ok to write things like padding data to the input buffer which are “beyond the end”, since later calls update() will simply overwrite this padding as long as the “index” value is unchanged. Similarly, it’s ok to update the padding cache and set the flag for that, as long as the two low-order bits of the nonce_low value aren’t changed and the nonce is never incremented.
More specifically, the only parts of the context which need to be updated but which should not be modified in place are the l2_state[] array and the count value. If a local copy of these values is made on the stack and used in place of the values in the context, you get the right result. Other things like the index value and the nonce only need to be read and not updated, and the nonce increment and the final reset of the index and count can just be removed entirely.
Attached is a patch which implements new snapshot_digest() functions for all of umac32, umac64, umac96, and umac128. I tested these against a full range of input buffer sizes and incremental buffer additions to make sure I caught all the different edge cases and I feel pretty confident in the results. The resulting digest values were validated by calling the original unmodified digest function as a reference. I tested both the output of snapshot_digest() against digest() and then the result of calling update(), snapshot_digest(), update() again, and the original digest(), making sure I got the same results as I would have when calculating digest() on the combined data from both updates in a separate umac context.
On Thu, Dec 22, 2016 at 9:16 PM, Niels Möller nisse@lysator.liu.se wrote:
Ron Frederick ronf@timeheart.net writes:
Rather than adding a function to return the structure size here, what do you think about adding functions such as umac32_new() which would do the allocation of the structure and return a pointer to it?
Actually, I think I'd prefer a way to get the struct size at run time; otherwise I'd have to consider interfaces to let an application override the allocation function used by the *_new functions.
Are you sure the ctypes package you use doesn't provide any general way to extract struct sizes from a C interface? I know it's common practice to design libraries to have only opaque types, with function calls for allocation, use, and deallocation. But I wouldn't expect nettle to be the only library which tries to be more low-level and expose some internals.
The internals exposure makes it hard to keep an ABI (for the reasons you also outlined in the email with nettle hashes). Nettle is special in that aspect because is intentionally low-level, but it is also offered as a shared library with an ABI.
An approach for Ron to overcome the specific problem could be to use wrappers over nettle which hide the details, such as the gnutls functions (e.g., gnutls_mac_fast etc.).
The more long term question, is should nettle be striving to provide an ABI? If not, it could transform to low-level library gnulib or ccan are (i.e. like a copy-lib)... or alternatively, if yes, provide a high level stable API which hides details and the ABI is easier to provide.
regards, Nikos
Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
An approach for Ron to overcome the specific problem could be to use wrappers over nettle which hide the details, such as the gnutls functions (e.g., gnutls_mac_fast etc.).
If gnutls has sutable wrappers for umac, that sounds like an easy solution, at least until needed interfaces are added to nettle.
The more long term question, is should nettle be striving to provide an ABI? If not, it could transform to low-level library gnulib or ccan are (i.e. like a copy-lib)... or alternatively, if yes, provide a high level stable API which hides details and the ABI is easier to provide.
We could add a thin layer of functions involving only opaque pointers, byte strings, and ints. If we really want, it could be a separate .so-file, which could have an ABI more stable than nettle's. But I'm not sure if that's really solves Ron's problem of using the functions without depending on the header file. E.g, say we add a function like
void *mac_ctx_new(enum nettle_mac_algorithm);
and it's called like
void *umac_ctx = mac_ctx_new(NETTLE_ALGORITHM_UMAC);
And then the python glue loads libnettle.so, and wants to make this function call. How would it find the numeric value of NETTLE_ALGORITHM_UMAC? Would it duplicate the definitions in the C header (which is ok, since these constants are (i) part of the ABI and supposedly stable, and (ii) not architecture dependent), or would we need some interface using plain strings to identify algorithms?
Happy holidays, /Niels
On Dec 23, 2016, at 11:26 AM, Niels Möller nisse@lysator.liu.se wrote:
Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
An approach for Ron to overcome the specific problem could be to use wrappers over nettle which hide the details, such as the gnutls functions (e.g., gnutls_mac_fast etc.).
If gnutls has sutable wrappers for umac, that sounds like an easy solution, at least until needed interfaces are added to nettle.
I did run across gnutls when looking around for UMAC implementations. From what I can tell, gnutls only exposes UMAC 96 and 128, though, and I was hoping to be able to provide options for all of UMAC-32, -64, -96, and -128.
The more long term question, is should nettle be striving to provide an ABI? If not, it could transform to low-level library gnulib or ccan are (i.e. like a copy-lib)... or alternatively, if yes, provide a high level stable API which hides details and the ABI is easier to provide.
We could add a thin layer of functions involving only opaque pointers, byte strings, and ints. If we really want, it could be a separate .so-file, which could have an ABI more stable than nettle's. But I'm not sure if that's really solves Ron's problem of using the functions without depending on the header file. E.g, say we add a function like
void *mac_ctx_new(enum nettle_mac_algorithm);
and it's called like
void *umac_ctx = mac_ctx_new(NETTLE_ALGORITHM_UMAC);
And then the python glue loads libnettle.so, and wants to make this function call. How would it find the numeric value of NETTLE_ALGORITHM_UMAC? Would it duplicate the definitions in the C header (which is ok, since these constants are (i) part of the ABI and supposedly stable, and (ii) not architecture dependent), or would we need some interface using plain strings to identify algorithms?
Python’s approach to this in “hashlib” is to use string names to identify algorithms, so that would definitely be a good option here. I don’t really see a problem using a set of enum values here which got replicated in a Python wrapper to identify algorithms if the mapping was guaranteed to remain stable, but it wouldn’t be as easy to add new algorithms later. Numbers would need to be assigned for them, and both the Python and C code would have to be updated to know about these new numbers before the new algorithms could be used.
Since you already have string names defined in your meta structures, allowing a lookup by name doesn’t seem like much of a stretch. You could potentially provide both options, with a function to map the name to the enum value. This would be an extra step for a caller who wanted to do the name-based lookup, but that could be hidden inside the wrapper.
Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
The more long term question, is should nettle be striving to provide an ABI? If not, it could transform to low-level library gnulib or ccan are (i.e. like a copy-lib)... or alternatively, if yes, provide a high level stable API which hides details and the ABI is easier to provide.
I'm experimenting with this... Starting with hashes which are the simplest primitive. Header file changes below. Comments appreciated, including naming suggestions.
Regards, /Niels
--- a/nettle-hash.h +++ b/nettle-hash.h @@ -0,0 +1,66 @@ +/* nettle-hash.h + + This file defines an opaque API for Nettle hash functions. + + Copyright (C) 2016 Niels Möller + + This file is part of GNU Nettle. + + GNU Nettle is free software: you can redistribute it and/or + modify it under the terms of either: + + * the GNU Lesser General Public License as published by the Free + Software Foundation; either version 3 of the License, or (at your + option) any later version. + + or + + * the GNU General Public License as published by the Free + Software Foundation; either version 2 of the License, or (at your + option) any later version. + + or both in parallel, as here. + + GNU Nettle is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received copies of the GNU General Public License and + the GNU Lesser General Public License along with this program. If + not, see http://www.gnu.org/licenses/. +*/ + +#ifndef NETTLE_HASH_H_INCLUDED +#define NETTLE_HASH_H_INCLUDED + +#include "nettle-types.h" + +#ifdef __cplusplus +extern "C" { +#endif + +struct nettle_hash; + +size_t +nettle_hash_get_context_size(const struct nettle_hash *hash); + +size_t +nettle_hash_get_digest_size(const struct nettle_hash *hash); + +void +nettle_hash_init(const struct nettle_hash *hash, void *ctx); + +void +nettle_hash_update(const struct nettle_hash *hash, + void *ctx, size_t length, const uint8_t *src); + +void +nettle_hash_digest(const struct nettle_hash *hash, + void *ctx, size_t length, uint8_t *dst); + +#ifdef __cplusplus +} +#endif + +#endif /* NETTLE_HASH_H_INCLUDED */ --- a/nettle-meta.h +++ b/nettle-meta.h @@ -115,7 +115,18 @@ struct nettle_hash }
/* null-terminated list of digests implemented by this version of nettle */ -extern const struct nettle_hash * const nettle_hashes[]; +extern const struct nettle_hash * const _nettle_hashes[]; + +const struct nettle_hash * const * +#ifdef __GNUC__ +__attribute__((pure)) +#endif +nettle_get_hashes (void); + +#define nettle_hashes (nettle_get_hashes()) + +const struct nettle_hash * +nettle_lookup_hash (const char *name);
extern const struct nettle_hash nettle_md2; extern const struct nettle_hash nettle_md4;
On Dec 27, 2016, at 2:59 AM, Niels Möller nisse@lysator.liu.se wrote:
I'm experimenting with this... Starting with hashes which are the simplest primitive. Header file changes below. Comments appreciated, including naming suggestions.
[snip]
+struct nettle_hash;
+size_t +nettle_hash_get_context_size(const struct nettle_hash *hash);
+size_t +nettle_hash_get_digest_size(const struct nettle_hash *hash);
+void +nettle_hash_init(const struct nettle_hash *hash, void *ctx);
+void +nettle_hash_update(const struct nettle_hash *hash,
void *ctx, size_t length, const uint8_t *src);
+void +nettle_hash_digest(const struct nettle_hash *hash,
void *ctx, size_t length, uint8_t *dst);
+const struct nettle_hash * +nettle_lookup_hash (const char *name);
This looks good to me, for non-keyed hashes.
nettle-bugs@lists.lysator.liu.se