nisse@lysator.liu.se (Niels Möller) writes:
I'll try to get this integrated reasonably soon. Have you compared the performance of the old and new code?
I've checked in the new code now, and I've fixed serpent_key_prepare to use LE_READ_UINT32 (with no alignment requirements) and with support for arbitrary key sizes up to 32 bytes.
This has been really straight forward.
On my development laptop, I get about the same performance for the new and the old code (x86_64, gcc-4.4.5).
I'd like to get rid of the serpent_block_t typedef (I think the use of an array may mess up register allcoation with some compilers, and that might be the source of the large performance difference on Simon's machine), and it's really silly to have a BLOCK_COPY in the round function.
So I think I'll want to spend some time on reorganizing the code (and there's also some purely stylistic things, like using uppercase for all macros) before adding anything really new, like the two-block trick.
Regards, /Niels