On Fri, Sep 25, 2020 at 10:25 AM Niels Möller nisse@lysator.liu.se wrote:
Jeffrey Walton noloader@gmail.com writes:
I believe the 64-bit adds (addudm) and subtracts (subudm) require POWER8.
I don't think there are any 64-bit adds in my chacha code, only 32-bit, vadduwm. The chacha state is fundamentally 16 32-bit words, with operations very friendly to 4-way simd.
Using 64-bit adds might be useful for later code doing multiple blocks, for updating the counter (for the original 64-bit counter variant of chacha). Might make sense to do manual carry handling to keep it working on power7.
I hope I'm not crossing my wires, but doesn't ChaCha core require a counter addition? That's where a 32-bit wrap can occur, and you need a 64-bit add to handle it correctly. That happens at x[12] and x[13] in Berstein's source code.[1]
Track the use of the PLUSONE macro in Bernstein's code. The '!x->input[12]' is the test for wrap on a 32-bit unsigned integer.
x->input[12] = PLUSONE(x->input[12]); if (!x->input[12]) { x->input[13] = PLUSONE(x->input[13]); /* stopping at 2^70 bytes per nonce is user's responsibility */ }
It should be easy enough to test. Start with a counter of 0xfffffff8 and encrypt a couple of [64-byte] blocks. You can use Bernstein's reference implementation to generate test vectors.[1]
Here's a hacked version of Bernstein's code that allows you to set the counter to something other than 0's: https://github.com/noloader/cryptopp-test/blob/master/ChaCha20/chacha.c. See the XXX_ctr_setup function.
There are some fundamental differences between Bernstein's ChaCha and the IETF's ChaCha used in TLS. Bernstein's ChaCha uses a 64-bit counter. The IETF's version uses a 32-bit counter, and the IETF fails to specify what happens when their 32-bit version wraps. Be sure to specify which version Nettle is providing in the docs because it leads to confusion for users.
[1] https://cr.yp.to/chacha.html and https://cr.yp.to/streamciphers/timings/estreambench/submissions/salsa20/chac....
Jeff