nisse@lysator.liu.se (Niels Möller) writes:
Simon Josefsson simon@josefsson.org writes:
Fixing a small bug in my 2010-12-07 port of serpent.c from libgcrypt to nettle was all that was required to make it work. Please consider this work, applied as follows:
Great!
I'll try to get this integrated reasonably soon. Have you compared the performance of the old and new code?
Old: jas@latte:~/src/lsh/nettle/examples$ ./nettle-benchmark serpent sha1_compress: 1139.20 cycles
benchmark call overhead: 0.002529 us Algorithm mode Mbyte/s
serpent256 ECB encrypt 28.69 serpent256 ECB decrypt 30.94 serpent256 CBC encrypt 26.99 serpent256 CBC decrypt 30.82 jas@latte:~/src/lsh/nettle/examples$
New: jas@latte:~/src/lsh/nettle/examples$ ./nettle-benchmark serpent sha1_compress: 1140.00 cycles
benchmark call overhead: 0.002425 us Algorithm mode Mbyte/s
serpent256 ECB encrypt 14.26 serpent256 ECB decrypt 16.34 serpent256 CBC encrypt 13.93 serpent256 CBC decrypt 15.93 jas@latte:~/src/lsh/nettle/examples$
Some minor things (which I think I can take care of myself):
/* Serpent works on 128 bit blocks. */ typedef uint32_t serpent_block_t[4];
/* Serpent key, provided by the user. If the original key is shorter than 256 bits, it is padded. */ typedef uint32_t serpent_key_t[8];
I dislike array typedefs.
#define byte_swap_32(x) \ (0 \ | (((x) & 0xff000000) >> 24) | (((x) & 0x00ff0000) >> 8) \ | (((x) & 0x0000ff00) << 8) | (((x) & 0x000000ff) << 24))
This and the endian test where it is used should be replaced by using LE_READ_UINT32.
Sounds good.
/* Convert the user provided key KEY of KEY_LENGTH bytes into the internally used format. */ static void serpent_key_prepare (const uint8_t * key, unsigned int key_length, serpent_key_t key_prepared)
This function seems to assume that key is aligned on a four-byte boundary, and that key_length is a multiple of four. The nettle interface specifies no alignment requirement on the key. And the old serpent code is supposed to support any key size (although unfortunately I don't have any testcases for sizes other than 16, 24 and 32 bytes).
Maybe throw an error for non-16/24/32 key sizes? I'm not sure how useful it is to support that.
After this code is in, I'd like to try to do serpent with two blocks at a time in parallel, for machines with native 64-bit registers (and change at least the ctr code to do a couple of blocks at a time). I think that might be about as fast as aes or camellia.
Did the old code do that? In any case, it looks like the performance of serpent has a long way to go to be comparable to aes or camellia:
jas@latte:~/src/lsh/nettle/examples$ ./nettle-benchmark camellia sha1_compress: 1139.20 cycles
benchmark call overhead: 0.002421 us Algorithm mode Mbyte/s
camellia128 ECB encrypt 146.18 camellia128 ECB decrypt 146.37 camellia128 CBC encrypt 127.87 camellia128 CBC decrypt 142.89
camellia192 ECB encrypt 110.03 camellia192 ECB decrypt 109.84 camellia192 CBC encrypt 98.50 camellia192 CBC decrypt 108.31
camellia256 ECB encrypt 109.72 camellia256 ECB decrypt 110.03 camellia256 CBC encrypt 98.46 camellia256 CBC decrypt 108.29 jas@latte:~/src/lsh/nettle/examples$ ./nettle-benchmark aes sha1_compress: 1149.60 cycles
benchmark call overhead: 0.002420 us Algorithm mode Mbyte/s
aes128 ECB encrypt 181.84 aes128 ECB decrypt 181.58 aes128 CBC encrypt 149.30 aes128 CBC decrypt 178.05
aes192 ECB encrypt 153.40 aes192 ECB decrypt 155.35 aes192 CBC encrypt 132.48 aes192 CBC decrypt 153.33
aes256 ECB encrypt 135.98 aes256 ECB decrypt 135.14 aes256 CBC encrypt 119.54 aes256 CBC decrypt 134.13
openssl aes128 ECB encrypt 203.50 openssl aes128 ECB decrypt 190.98
openssl aes192 ECB encrypt 171.97 openssl aes192 ECB decrypt 162.42
openssl aes256 ECB encrypt 149.75 openssl aes256 ECB decrypt 142.56 jas@latte:~/src/lsh/nettle/examples$
/Simon