Joachim Strömbergson joachim@secworks.se writes:
I've taken a shot at implementing the ChaCha stream cipher for Nettle. Nettle is a modified version of Salsa20 done by DJB in order to improve both performance (esp on CPUs with support for data parallelism) and somewhat improved performance.
Cool! I looked briefly at ChaCha last time I worked on salsa20. I understand it has potential to be a bit faster than salsa20, so it will be interesting to see how that turns out.
This implementation also supports different number of rounds.
Which variants are recommended, or in real use? 20 and 12, just like for salsa20?
ChaCha _should_ be a bit faster than Salsa20 and should esp be easier to optimize in asm for modern CPUs. I have however not done any benchmarks nor asm implementation (yet).
Adding chacha benchmarking in examples/nettle-benchmark should be easy.
And if you like playing with either x86_64 sse2 (ugly) or arm neon (nicer), I think it's a not too difficult exercise to implement chacha based on the salsa20 assembly files (in the x86_64 and arm/neon directories).
Since I'm new as a contributor I don't know how you Niels want to have patches. Please let me know if this looks good and something you want to integrate and if so how.
I'm used to patches on the mailing list (I still feel a bit like a git newbie. I could also pull changes from a repository of yours, but I'd prefer a mailed patch unless I'm confident I want to integrate the work directly with no changes). An ideal patch set for chacha would include
* The implementation, more or less what you have now,
* A const struct nettle_cipher defined in chacha-meta.c, for each important variant (number of rounds and key size)
* A testcase following the conventions of the testsuite/*-c files.
* Integration in examples/nettle-benchmark.c (should be trivial). (Both benchmark and testcode would use the chacha-meta glue).
* Documentation for nettle.texinfo (but maybe that should wait until interface has settled).
* GNU-style ChangeLog entries for each change.
Preferably arranged so that independent changes (C implementation, docs, assembly implementation) can be applied one at a time. This is a wish list, to make integration quick and easy, but you don't have to get everything in order for the contribution to be useful.
I've only had a quick look at the actual code now, but my first impression is that it looks pretty good. I think I'd prefer to not have the number of rounds in the context, though, and instead have separate functions for different variants, possibly calling a common function taking the number of rounds as argument.
Regards, /Niels