-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Aloha!
Niels Möller wrote:
Right, and this time in openssl's favour. I think that speed is quite impressive. I haven't written any arcfour assembly for x86_64, but I have tried earlied for x86 and sparc. It's a very serial loop doing one byte at a time. It's tempting to try to do two bytes at a time, but the easy way gives incorrect results when the i and j indices happen to collide.
Yes, that is a problem. I've implemented RC4 in HW running at 2 cycles/byte but you end up dealing with collisions (and deep logic in combination with mem/reg lookup for state. And overlapping scheduling. And many memory ports.)
An easier trick is to generate 4 or eight bytes of the keystream at a time, collecting result in a register, so the xoring of the data can be done a word at a time. The sparc implementation does something along those lines, and at least does the data writes as aligned words.
Sounds like the best strategy, there really isn't much parallelism in RC4 and initialization is costly esp if one want to removed bias by throwing awat 256, 512, 768, 1024 etc bytes (depending in which suggested recommendation you want to follow.)
I tried looking at the OpenSSL ASM-code to see if one could to a simple fix to Nettle. Naive I admit.
- -- Med vänlig hälsning, Yours
Joachim Strömbergson - Alltid i harmonisk svängning. ======================================================================== Joachim Strömbergson Secworks AB joachim@secworks.se ========================================================================