Hello,
we (Sequoia PGP) would love to see OCB being implemented in Nettle. The OpenPGP working group is working on a revision of RFC4880, which will mostly be a cryptographic refresh, and will bring AEAD to OpenPGP.
The previous -now abandoned- draft called for EAX being mandatory, and OCB being optional [0]. This was motivated by OCB being encumbered by patents. However, said patents were waived by the holder [1].
0: https://datatracker.ietf.org/doc/html/draft-ietf-openpgp-rfc4880bis-10#secti... 1: https://mailarchive.ietf.org/arch/msg/cfrg/qLTveWOdTJcLn4HP3ev-vrj05Vg/
With OCB being no longer patent-encumbered, it seems preferable over the two-pass EAX construction. Therefore, it seems plausible that the WG makes OCB mandatory to implement. To support that in Sequoia, we'd need support for that in Nettle (Nettle is our main cryptographic backend).
Unfortunately, we don't have the expertise in our team to contribute a patch, and we currently aren't in a position to offer funding for the implementation.
Thanks, Justus
Justus Winter justus@sequoia-pgp.org writes:
we (Sequoia PGP) would love to see OCB being implemented in Nettle. The OpenPGP working group is working on a revision of RFC4880, which will mostly be a cryptographic refresh, and will bring AEAD to OpenPGP.
The previous -now abandoned- draft called for EAX being mandatory, and OCB being optional [0]. This was motivated by OCB being encumbered by patents. However, said patents were waived by the holder [1].
0: https://datatracker.ietf.org/doc/html/draft-ietf-openpgp-rfc4880bis-10#secti... 1: https://mailarchive.ietf.org/arch/msg/cfrg/qLTveWOdTJcLn4HP3ev-vrj05Vg/
That's good news, I hadn't seen that. Then OCB gets a lot more interesting. And https://datatracker.ietf.org/doc/html/rfc7253 is a proper reference (there seems to be a couple of different versions of OCB)?
Unfortunately, we don't have the expertise in our team to contribute a patch, and we currently aren't in a position to offer funding for the implementation.
If someone wants to work on it, please post to the list. I might look into it myself, but as you have noticed, I have rather limited hacking time.
Regards, /Niels
nisse@lysator.liu.se (Niels Möller) writes:
If someone wants to work on it, please post to the list. I might look into it myself, but as you have noticed, I have rather limited hacking time.
I've given it a try, see branch ocb-mode. Based on RFC 7253. Passes tests, but not particularly optimized. Some comments and questions:
1. Most of the operations use only the enrypt function of the underlying block cipher. Except ocb decrypt, which needs *both* the decrypt function and the encrypt function. For ciphers that use different key setup for encrypt and decrypt, e.g., AES, that means that to decrypt OCB one needs to initialize two separate aes128_ctx. To call the somewhat unwieldy
void ocb_decrypt (struct ocb_ctx *ctx, const struct ocb_key *key, const void *encrypt_ctx, nettle_cipher_func *encrypt, const void *decrypt_ctx, nettle_cipher_func *decrypt, size_t length, uint8_t *dst, const uint8_t *src);
2. It's not obvious how to best manage the different L_i values. Can be computed upfront, on demand, or cached in some way. Current code computes only L_*, L_$ and L_0 up front (part of ocb_set_key), and the others recomputed each time they're needed.
3. The processing of the authenticated data doesn't depend on the nonce in any way. That means that if one processes several messages with the same key and associated data, the associated data can be processed once, with the same sum reused for all messages.
Is that something that is useful in practice, and which nettle interfaces should support?
4. The way the nonce is used seems designed to allow cheap incrementing of the nonce. The nonce is used to determine
Offset_0 = Stretch[1+bottom..128+bottom]
where "bottom" is the least significant 6 bits of the nonce, acting as a shift, and "Stretch" is independent of those nonce bits, so unchanged on all but one out of 64 nonce increments.
Should nettle support some kind of auto-incrementing nonce that takes advantage of this? Nettle does something similar for UMAC (not sure if there are others).
As I said, current code is not particularly optimized, but OCB has potential to be quite fast. The per-block processing for authentication of the message (not associated data) is just an XOR. And encryption/decryption can be done several blocks in parallel, like CTR mode. If we do, e.g., 4 or 8 blocks at a time, there will be a fairly regular structure of the needed Offset_i values, possibly making them cheaper to setup, but I haven't yet looked into those details.
Regards, /Niels
Hello Niels :)
sorry for not following up earlier. Thanks for working on it!
nisse@lysator.liu.se (Niels Möller) writes:
nisse@lysator.liu.se (Niels Möller) writes:
If someone wants to work on it, please post to the list. I might look into it myself, but as you have noticed, I have rather limited hacking time.
I've given it a try, see branch ocb-mode. Based on RFC 7253. Passes tests, but not particularly optimized.
I have wrapped it in our Rust bindings, glued Sequoia to it, and did some interop testing. Looks all good.
Some comments and questions:
Most of the operations use only the enrypt function of the underlying block cipher. Except ocb decrypt, which needs *both* the decrypt function and the encrypt function. For ciphers that use different key setup for encrypt and decrypt, e.g., AES, that means that to decrypt OCB one needs to initialize two separate aes128_ctx. To call the somewhat unwieldy
void ocb_decrypt (struct ocb_ctx *ctx, const struct ocb_key *key, const void *encrypt_ctx, nettle_cipher_func *encrypt, const void *decrypt_ctx, nettle_cipher_func *decrypt, size_t length, uint8_t *dst, const uint8_t *src);
I don't mind it being unwieldy.
- It's not obvious how to best manage the different L_i values. Can be computed upfront, on demand, or cached in some way. Current code computes only L_*, L_$ and L_0 up front (part of ocb_set_key), and the others recomputed each time they're needed.
I cannot comment on that.
The processing of the authenticated data doesn't depend on the nonce in any way. That means that if one processes several messages with the same key and associated data, the associated data can be processed once, with the same sum reused for all messages.
Is that something that is useful in practice, and which nettle interfaces should support?
That is an interesting question. Currently, the OpenPGP drafts that include AEAD do include the chunk index in the authenticated data and would therefore not benefit from this optimization. However, I've raised this point in our issue tracker:
https://gitlab.com/openpgp-wg/rfc4880bis/-/issues/86
The way the nonce is used seems designed to allow cheap incrementing of the nonce. The nonce is used to determine
Offset_0 = Stretch[1+bottom..128+bottom]
where "bottom" is the least significant 6 bits of the nonce, acting as a shift, and "Stretch" is independent of those nonce bits, so unchanged on all but one out of 64 nonce increments.
Should nettle support some kind of auto-incrementing nonce that takes advantage of this? Nettle does something similar for UMAC (not sure if there are others).
That is also interesting. I have raised the point in our issue tracker, and Daniel Huigens observed that at least their Go implementation simply compares the top-most bits with the ones provided for the previous chunk. Botan does the same.
https://gitlab.com/openpgp-wg/rfc4880bis/-/issues/84 https://github.com/ProtonMail/go-crypto/blob/70ae35bab23f26f6188bab82cb34d7f... https://botan.randombit.net/doxygen/ocb_8cpp_source.html#l00264
This has the benefit of working for how OpenPGP currently constructs the nonce, which does not result in monotonically incrementing nonces (currently, we take an IV and xor in the chunk index). But, we may change the scheme.
Thanks, Justus
Justus Winter justus@sequoia-pgp.org writes:
I've given it a try, see branch ocb-mode. Based on RFC 7253. Passes tests, but not particularly optimized.
I have wrapped it in our Rust bindings, glued Sequoia to it, and did some interop testing. Looks all good.
Based on the functions declared in ocb.h, or the struct nettle_ocb_aesxxx?
I'm thinking that it might make sense to add the latter as part of the public api for next release, but leave all other functions as internal to let dust settle a bit?
- It's not obvious how to best manage the different L_i values. Can be computed upfront, on demand, or cached in some way. Current code computes only L_*, L_$ and L_0 up front (part of ocb_set_key), and the others recomputed each time they're needed.
I cannot comment on that.
Changing it later we be a bit difficult (ABI break, if more space is needed in the context struct), so we need to decide on something reasonable.
The processing of the authenticated data doesn't depend on the nonce in any way. That means that if one processes several messages with the same key and associated data, the associated data can be processed once, with the same sum reused for all messages.
Is that something that is useful in practice, and which nettle interfaces should support?
That is an interesting question. Currently, the OpenPGP drafts that include AEAD do include the chunk index in the authenticated data and would therefore not benefit from this optimization. However, I've raised this point in our issue tracker:
This choice can affect both API and ABI.
The way the nonce is used seems designed to allow cheap incrementing of the nonce. The nonce is used to determine
Offset_0 = Stretch[1+bottom..128+bottom]
where "bottom" is the least significant 6 bits of the nonce, acting as a shift, and "Stretch" is independent of those nonce bits, so unchanged on all but one out of 64 nonce increments.
Should nettle support some kind of auto-incrementing nonce that takes advantage of this? Nettle does something similar for UMAC (not sure if there are others).
That is also interesting. I have raised the point in our issue tracker, and Daniel Huigens observed that at least their Go implementation simply compares the top-most bits with the ones provided for the previous chunk. Botan does the same.
For any kind of optimization of this, one needs to store previous nonce and relatede values in the context.
To generalize it to more than auto-increment, one get the problem that for the first set_nonce, there is no previous nonce to compare to. So one would need an extra flag just for that, which I don't think is so nice. Alternatively, use a zero nonce by default at initialization. Then there's another slight complication: To set the nonce, one needs to know the "tag_length". In the current version, that is passed as an argument to set_nonce. It could perhaps be passed with set_key instead, it's now some time since I read the RFC, but I don't think it is proper use to use OCB with the same key, but change tag_length from message to message.
This has the benefit of working for how OpenPGP currently constructs the nonce, which does not result in monotonically incrementing nonces (currently, we take an IV and xor in the chunk index). But, we may change the scheme.
I think it would be nice to stick to a simply incrementing nonce value.
Regards, /Niels
nisse@lysator.liu.se (Niels Möller) writes:
Justus Winter justus@sequoia-pgp.org writes:
I've given it a try, see branch ocb-mode. Based on RFC 7253. Passes tests, but not particularly optimized.
I have wrapped it in our Rust bindings, glued Sequoia to it, and did some interop testing. Looks all good.
Based on the functions declared in ocb.h, or the struct nettle_ocb_aesxxx?
I'm thinking that it might make sense to add the latter as part of the public api for next release, but leave all other functions as internal to let dust settle a bit?
Based on the ocb.h interface. The latter is not a good fit for our bindings, which let you combine AEAD modes with block ciphers, something OpenPGP also allows (for better or worse). Also, the latter is not useful for OpenPGP as specified in both the RFC4880bis draft and the crypto-refresh draft, as both use 120 bit nonces, whereas the interface uses 96 bit nonces.
I have opened an issue about the nonce size, and we decided to seek clarification from the authors of RFC7253:
https://gitlab.com/openpgp-wg/rfc4880bis/-/issues/83
The processing of the authenticated data doesn't depend on the nonce in any way. That means that if one processes several messages with the same key and associated data, the associated data can be processed once, with the same sum reused for all messages.
Is that something that is useful in practice, and which nettle interfaces should support?
That is an interesting question. Currently, the OpenPGP drafts that include AEAD do include the chunk index in the authenticated data and would therefore not benefit from this optimization. However, I've raised this point in our issue tracker:
This choice can affect both API and ABI.
We optimistically changed our scheme to not change the associated data between chunks (except for the last chunk):
] For each chunk, the AEAD construction is given the Packet Tag in new ] format encoding (bits 7 and 6 set, bits 5-0 carry the packet tag), ] version number, cipher algorithm octet, AEAD algorithm octet, and chunk ] size octet as additional data. For example, the additional data of the ] first chunk using EAX and AES-128 with a chunk size of 2**16 octets ] consists of the octets 0xD2, 0x02, 0x07, 0x01, and 0x10. ] ] After the final chunk, the AEAD algorithm is used to produce a final ] authentication tag encrypting the empty string. This AEAD instance is ] given the additional data specified above, plus an eight-octet, ] big-endian value specifying the total number of plaintext octets ] encrypted. This allows detection of a truncated ciphertext.
The way the nonce is used seems designed to allow cheap incrementing of the nonce. The nonce is used to determine
Offset_0 = Stretch[1+bottom..128+bottom]
where "bottom" is the least significant 6 bits of the nonce, acting as a shift, and "Stretch" is independent of those nonce bits, so unchanged on all but one out of 64 nonce increments.
Should nettle support some kind of auto-incrementing nonce that takes advantage of this? Nettle does something similar for UMAC (not sure if there are others).
That is also interesting. I have raised the point in our issue tracker, and Daniel Huigens observed that at least their Go implementation simply compares the top-most bits with the ones provided for the previous chunk. Botan does the same.
For any kind of optimization of this, one needs to store previous nonce and relatede values in the context.
To generalize it to more than auto-increment, one get the problem that for the first set_nonce, there is no previous nonce to compare to. So one would need an extra flag just for that, which I don't think is so nice. Alternatively, use a zero nonce by default at initialization. Then there's another slight complication: To set the nonce, one needs to know the "tag_length". In the current version, that is passed as an argument to set_nonce. It could perhaps be passed with set_key instead, it's now some time since I read the RFC, but I don't think it is proper use to use OCB with the same key, but change tag_length from message to message.
This has the benefit of working for how OpenPGP currently constructs the nonce, which does not result in monotonically incrementing nonces (currently, we take an IV and xor in the chunk index). But, we may change the scheme.
I think it would be nice to stick to a simply incrementing nonce value.
We optimistically changed the scheme so that it is a counter, albeit one not starting from zero:
] The nonce for AEAD mode consists of two parts. Let N be the size of the ] nonce. The left-most N - 64 bits are the initialization vector derived ] using HKDF. The right-most 64 bits are the chunk index as big-endian ] value. The index of the first chunk is zero.
Cheers, Justus
nettle-bugs@lists.lysator.liu.se