-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Aloha!
What is the status of Poly1305 in Nettle, i.e. is the branch ready to be merged into master or when might that happen?
I have to admit that I haven't studied how UMAC and Poly1305 uses (in what phase) the block cipher. Would they gain any substantial performance if they could use AES-NI when available?
- -- Med vänlig hälsning, Yours
Joachim Strömbergson - Alltid i harmonisk svängning. ======================================================================== Joachim Strömbergson Secworks AB joachim@secworks.se ========================================================================
Joachim Strömbergson joachim@secworks.se writes:
What is the status of Poly1305 in Nettle, i.e. is the branch ready to be merged into master
There are a couple of things I'd like to do. Quoting a mail from 21/11:
* Take out the nonce from struct poly1305_ctx, and let poly1305_aes do all nonce handling. poly1305_digest gets the encrypted nonce as argument.
* For poly1305_aes, use aes128_ctx (it's hard coded for 128-bit AES anyway), and perhaps rename it to poly1305_aes128.
* Introduce a poly1305_update function, and use preprocessor casting tricks to define poly1305_aes128_update (and any other poly1305_*_update) as an alias.
* Promote union gcm_block to a more general abstraction, renaming it to nettle_block16 or so, and use it to guarantee nicer alignment for block buffer and nonce in poly1305.
or when might that happen?
Not sure, I'd need a day or so of hacking time to finish the above.
I have to admit that I haven't studied how UMAC and Poly1305 uses (in what phase) the block cipher. Would they gain any substantial performance if they could use AES-NI when available?
Improving aes performance would sure be generally good, but it doesn't matter much for umac and poly1305. umac uses aes to generate its subkeys (which can be a fair amount of data), and both use aes to encrypt the nonce. But the heavy mangling of the message bytes doesn't depend on aes.
Regards, /Niels
nisse@lysator.liu.se (Niels Möller) writes:
Joachim Strömbergson joachim@secworks.se writes:
What is the status of Poly1305 in Nettle, i.e. is the branch ready to be merged into master
There are a couple of things I'd like to do. Quoting a mail from 21/11:
I've done most of this now, and merged into the master branch.
- Take out the nonce from struct poly1305_ctx, and let poly1305_aes do all nonce handling. poly1305_digest gets the encrypted nonce as argument.
Done.
- For poly1305_aes, use aes128_ctx (it's hard coded for 128-bit AES anyway), and perhaps rename it to poly1305_aes128.
Done.
- Introduce a poly1305_update function, and use preprocessor casting tricks to define poly1305_aes128_update (and any other poly1305_*_update) as an alias.
Done, then undone; I don't think we need this generality now. The code is now organized so that struct poly1305_ctx and related functions only hold the state related to polynomial arithmetic (and where implementation details for an optimized implementation are machine-specific).
While struct poly1305_aes_ctx holds the block buffer, the nonce, and the corresponding functions take care of buffering, final padding, and handling of the nonce.
- Promote union gcm_block to a more general abstraction, renaming it to nettle_block16 or so, and use it to guarantee nicer alignment for block buffer and nonce in poly1305.
Done, but in poly1305 only used for the encrypted nonce argument for poly1305_digest. For nonce and block, they'll be aligned anyway, and I see no C code which can take any advantage of word access (except possibly incrementing the nonce, but that's very marginal).
If you or anyone else can test this on macosx and windows, that would be nice. The x86_64 assembly is intended to work there too, but not tested.
Some numbers:
On my lowend home machine (AMD E-350), benchmarking gives 11 cycles/byte for the C implementation. And 3 cycles/byte with the x86_64 assembly, slightly faster than umac64.
The current source code is 220 lines of C code and 99 lines assembly (excluding comments and empty lines).
Regards, /Niels
On Mon, Jan 20, 2014 at 10:12 PM, Niels Möller nisse@lysator.liu.se wrote:
nisse@lysator.liu.se (Niels Möller) writes:
Joachim Strömbergson joachim@secworks.se writes:
What is the status of Poly1305 in Nettle, i.e. is the branch ready to be merged into master
There are a couple of things I'd like to do. Quoting a mail from 21/11:
I've done most of this now, and merged into the master branch.
- Take out the nonce from struct poly1305_ctx, and let poly1305_aes do all nonce handling. poly1305_digest gets the encrypted nonce as argument.
Done.
Is the AEAD construction of poly1305 with chacha [0] planned to be included? It is currently intended to be used in TLS so it would be a really useful to have in nettle.
[0]. http://tools.ietf.org/html/draft-agl-tls-chacha20poly1305-04
regards, Nikos
Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
Is the AEAD construction of poly1305 with chacha [0] planned to be included? It is currently intended to be used in TLS so it would be a really useful to have in nettle.
Would make sense, once the spec is stable. Comment on aead-interfaces in general is appreciated. Maybe RFC5116 is useful guidance,
[0]. http://tools.ietf.org/html/draft-agl-tls-chacha20poly1305-04
Thanks for the pointer. Are you (or anyone else on this list) involved in this ietf process? On which ietf list is it discussed?
After a quick reading, the following jumps out at me (in Sec. 5):
The reason for generating the Poly1305 key like this rather than using key material from the handshake is that handshake key material is per-session, but for a polynomial MAC, a unique, secret key is needed per-record.
As far as I understand, you can use the same poly1305 key for a large number of records/messages, as long as you have a unique nonce for each message.
Then it should work fine in tls to use a per-session key for both chacha and poly135, and then use the same nonce for both chacha and poly1305, based on the record sequence number.
Am I missing something? I guess Adam Langley usually knows what he's doing. But otherwise, the paragraph in the draft, and the awkward method it describes, makes absolutely no sense to me.
Regards, /Niels
On Tue, Jan 21, 2014 at 9:45 AM, Niels Möller nisse@lysator.liu.se wrote:
Would make sense, once the spec is stable. Comment on aead-interfaces in
general is appreciated. Maybe RFC5116 is useful guidance,
[0]. http://tools.ietf.org/html/draft-agl-tls-chacha20poly1305-04
Thanks for the pointer. Are you (or anyone else on this list) involved in this ietf process? On which ietf list is it discussed?
The IETF WG chairs plan to forward that. Whether the final version will be the same is unknown, but I find it highly unlikely to change.
After a quick reading, the following jumps out at me (in Sec. 5):
The reason for generating the Poly1305 key like this rather than using key material from the handshake is that handshake key material is per-session, but for a polynomial MAC, a unique, secret key is needed per-record. As far as I understand, you can use the same poly1305 key for a large number of records/messages, as long as you have a unique nonce for each message.
Indeed, the reason (I presume) for this construction is to avoid a "flaw" in polynomial MACs. The "flaw" is that if you use a constant key per session, once an attacker manages to make few forgeries he can recover the key. This construction by re-keying poly1305 on each record avoids that issue.
Am I missing something? I guess Adam Langley usually knows what he's doing. But otherwise, the paragraph in the draft, and the awkward method it describes, makes absolutely no sense to me.
That construction (or at least a very similar one) is described by Bernstein in "Cryptography in NaCl".
regards, Nikos
Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
Indeed, the reason (I presume) for this construction is to avoid a "flaw" in polynomial MACs. The "flaw" is that if you use a constant key per session, once an attacker manages to make few forgeries he can recover the key.
Assuming there's no nonce, right? But on second reading, I think the draft uses no poly1305 nonce, or at least, doesn't use a nonce in the same way as with poly1305-aes.
But then, the question is how the 32 byte key is used. For poly1305-aes, you have 16 bytes specifying the point where the polynomial is evaluated, and a 16 byte aes key used to encrypt the nonce. Question is how the other 16 bytes are used. I guess they're also mixed into the digest output in some way.
That construction (or at least a very similar one) is described by Bernstein in "Cryptography in NaCl".
Ok, I have to look that up, probably that will make everything clear.
Regards, /Niels
On Tue, Jan 21, 2014 at 10:40 AM, Niels Möller nisse@lysator.liu.se wrote:
Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
Indeed, the reason (I presume) for this construction is to avoid a
"flaw"
in polynomial MACs. The "flaw" is that if you use a constant key per session, once an attacker manages to make few forgeries he can recover
the
key.
Assuming there's no nonce, right?
Indeed.
But on second reading, I think the draft uses no poly1305 nonce, or at least, doesn't use a nonce in the same way as with poly1305-aes.
They have nothing in common. The nonce and the key used by poly1305 in poly1305-chacha are the first blocks generated by chacha.
regards, Nikos
nisse@lysator.liu.se (Niels Möller) writes:
- Introduce a poly1305_update function, and use preprocessor casting tricks to define poly1305_aes128_update (and any other poly1305_*_update) as an alias.
Done, then undone;
I tried the same trick for a different function, and it turns out it actually don't work with gcc. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59905, it will be interesting to see if anyone else agrees it is a bug.
And the only fully kosher and portable way is to introduce a large number of wrapper functions, like
foo (struct foo_ctx *ctx, ...); foo_wrapper (void *p, ...) { return foo(p) ;}
I'd prefer to not do that.
Regards, /Niels
nettle-bugs@lists.lysator.liu.se