Simon Josefsson simon@josefsson.org writes:
nisse@lysator.liu.se (Niels Möller) writes:
I'll try to get this integrated reasonably soon. Have you compared the performance of the old and new code?
Old: serpent256 ECB encrypt 28.69 serpent256 ECB decrypt 30.94 serpent256 CBC encrypt 26.99 serpent256 CBC decrypt 30.82
New: serpent256 ECB encrypt 14.26 serpent256 ECB decrypt 16.34 serpent256 CBC encrypt 13.93 serpent256 CBC decrypt 15.93
Then there's clearly some room for optimization.
Maybe throw an error for non-16/24/32 key sizes? I'm not sure how useful it is to support that.
Not terribly useful, I guess, but since it's well defined by the serpent spec, I think it should be supported.
After this code is in, I'd like to try to do serpent with two blocks at a time in parallel, for machines with native 64-bit registers (and change at least the ctr code to do a couple of blocks at a time). I think that might be about as fast as aes or camellia.
Did the old code do that?
No, I didn't want to do that work with the old code.
In any case, it looks like the performance of serpent has a long way to go to be comparable to aes or camellia:
On my Intel SU4100 laptop (64 bit), I get (still with the old serpent code):
aes256 ECB encrypt 54.72 aes256 ECB decrypt 54.36
camellia256 ECB encrypt 43.10 camellia256 ECB decrypt 43.09
serpent256 ECB encrypt 22.47 serpent256 ECB decrypt 26.89
I would expect that the two-block-in-parallel trick can almost double serpent performance (for ecb, ctr, cbc-decrypt, but not cbc-encrypt). And then all three algorithms are definitely in the same ballpark.
(And of these algorithms, only aes uses handwritten x86_64 assembly code).
Regards, /Niels