Michael Weiser michael@weiser.dinsnail.net writes:
Porting over the basic IF_[LB]E mechanism from chacha-core-internal was easy and fixed up the first of the three interleaved blocks right away. For the other two I am still in the process of wrapping my head around how the interleaving works and how it would need some adjustment for BE.
The 3-way functions don't do anything fancy, just each of the three blocks represented in separate registers, and same instruction sequence as for the 1-way version, duplicated threee times and interleaved.
The 2-way version (for ARM, that's salsa only) tries to be a bit more clever, with registers representing either odd or even words from both blocks.
Not sure how endianness affects the code to move words around.
Byte swapping should go close to the final stores, but after the addition of the initial state.
Regards, /Niels