And now I've added support for salsa20 benchmarking as well. Performance is pretty good, even for this reference implementation. Seems to run at 12 cycles per byte on my laptop, a little faster than 128-bit aes.
To get that working, I added a very kludgy
const struct nettle_cipher nettle_salsa20 = {...};
to nettle-internal.c, for benchmarking only.
I also arranged in the Makefile so that nettle-internal.o no longer is included in the nettle library. Will that cause any trouble for anybody?
Regards, /Niels