Simon Josefsson simon@josefsson.org writes:
Yes, although if necessary I could xor it to a zero buffer if there were no other way...
I was going to suggest using something like
void salsa20_core (const uint32_t *input, unsigned rounds, unsigned length, uint8_t *dst, const uint8_t *src)
which would compute the hash of the INPUT. If SRC is NULL, it would just store the LENGTH first bytes at DST. And if SRC is non-NULL, it would xor that data before storing.
A bit clumsy, but reasonably general.
But then I remembered that for high performance encryption, it's not sufficient with an assembly function which does only one block. You can gain a lot of performance (current code doesn't do that) by hashing two input blocks in parallel.
So at the moment, I'm inclined to let the assembly function do encryption, including xoring data and incrementing the counter. And if you only need the hash, you'd need a wrapper which encrypts a zero buffer and undoes the incrementing (the assembly function could leave the input block unmodified and maintain the updated counter locally; that would be a bit extra hassle but I don't think it would impact performance). Then we have sufficient freedom for optimizing salsa20 encryption, with a small performance disadvantage for using just the hash.
however I'll loose performance, and my application (scrypt) would benefit from good performance.
Will you be hashing several independent blocks? If so, for highest perforamnce you would also neeed an interface which lets the assembly routines do several blocks in parallel.
Regards, /Niels