nisse@lysator.liu.se (Niels Möller) writes:
If someone wants to work on it, please post to the list. I might look into it myself, but as you have noticed, I have rather limited hacking time.
I've given it a try, see branch ocb-mode. Based on RFC 7253. Passes tests, but not particularly optimized. Some comments and questions:
1. Most of the operations use only the enrypt function of the underlying block cipher. Except ocb decrypt, which needs *both* the decrypt function and the encrypt function. For ciphers that use different key setup for encrypt and decrypt, e.g., AES, that means that to decrypt OCB one needs to initialize two separate aes128_ctx. To call the somewhat unwieldy
void ocb_decrypt (struct ocb_ctx *ctx, const struct ocb_key *key, const void *encrypt_ctx, nettle_cipher_func *encrypt, const void *decrypt_ctx, nettle_cipher_func *decrypt, size_t length, uint8_t *dst, const uint8_t *src);
2. It's not obvious how to best manage the different L_i values. Can be computed upfront, on demand, or cached in some way. Current code computes only L_*, L_$ and L_0 up front (part of ocb_set_key), and the others recomputed each time they're needed.
3. The processing of the authenticated data doesn't depend on the nonce in any way. That means that if one processes several messages with the same key and associated data, the associated data can be processed once, with the same sum reused for all messages.
Is that something that is useful in practice, and which nettle interfaces should support?
4. The way the nonce is used seems designed to allow cheap incrementing of the nonce. The nonce is used to determine
Offset_0 = Stretch[1+bottom..128+bottom]
where "bottom" is the least significant 6 bits of the nonce, acting as a shift, and "Stretch" is independent of those nonce bits, so unchanged on all but one out of 64 nonce increments.
Should nettle support some kind of auto-incrementing nonce that takes advantage of this? Nettle does something similar for UMAC (not sure if there are others).
As I said, current code is not particularly optimized, but OCB has potential to be quite fast. The per-block processing for authentication of the message (not associated data) is just an XOR. And encryption/decryption can be done several blocks in parallel, like CTR mode. If we do, e.g., 4 or 8 blocks at a time, there will be a fairly regular structure of the needed Offset_i values, possibly making them cheaper to setup, but I haven't yet looked into those details.
Regards, /Niels