Hi! Please find attached a port of DJB's public domain code for Salsa20 to nettle. The patch is not meant to be applied as-is but to start a discussion. Review of the code itself is welcome.
While writing this I noticed a sub-optimal design of Nettle, in that it overloads the meaning of struct nettle_cipher->block_size as a flag of whether the cipher is a stream cipher or not. Before arcfour was the only stream cipher in nettle, and using block_size=0 to signal that it is a stream cipher worked fine. However, Salsa20 has a block size of 8 bytes. Another odd thing is that Salsa20 requires an IV. So some changes to the nettle_cipher struct are required.
This could be resolved better than I have done in the patch below. I see two reasonable alternatives:
1) Introduce a new struct 'struct nettle_stream_cipher' that applies to stream ciphers and use that.
2) Add a flag in 'struct nettle_cipher' to tell whether the cipher is a stream cipher or not.
I started out strongly prefering 1) because it appeared cleaner. After thinking about the changes required to implement that, I am now leaning towards 2) because of:
A) a lot of code would have to be duplicated if we have both 'struct nettle_cipher' and 'struct nettle_stream_cipher' for apparently little gain. Otherwise, most code could be the same, and where the distinction matters there could be a simple if-test.
B) it allows for full backwards compatibility in a clean way. With 1) we would have a transition period where people switch arcfour usage from nettle_cipher to nettle_stream_cipher. With 2) arcfour would continue to have block_size==0 but it would also set the new stream cipher flag. External code could incrementally be updated to check the new flag instead of block_size==0 (if it does that at all) but that is something that doesn't uglify nettle.
What do you think? I'd be happy to propose a patch to implement 2) and then a follow-on patch to add support for Salsa20.
/Simon
From 600ae2815634ad95c4d2d93821316ce20df19ae3 Mon Sep 17 00:00:00 2001
From: Simon Josefsson simon@josefsson.org Date: Wed, 21 Mar 2012 17:06:13 +0100 Subject: [PATCH] Implement Salsa20.
--- Makefile.in | 3 +- examples/nettle-benchmark.c | 3 + nettle-meta-ciphers.c | 2 + nettle-meta.h | 8 ++- nettle-types.h | 5 + salsa20-meta.c | 49 ++++++++++++ salsa20.c | 172 ++++++++++++++++++++++++++++++++++++++++++ salsa20.h | 71 +++++++++++++++++ testsuite/.gitignore | 1 + testsuite/.test-rules.make | 3 + testsuite/Makefile.in | 1 + testsuite/meta-cipher-test.c | 2 + testsuite/salsa20-test.c | 49 ++++++++++++ testsuite/testutils.c | 36 +++++++-- testsuite/testutils.h | 10 +++ 15 files changed, 407 insertions(+), 8 deletions(-) create mode 100644 salsa20-meta.c create mode 100644 salsa20.c create mode 100644 salsa20.h create mode 100644 testsuite/salsa20-test.c
diff --git a/Makefile.in b/Makefile.in index 1813a0d..6e2ec60 100644 --- a/Makefile.in +++ b/Makefile.in @@ -79,6 +79,7 @@ nettle_SOURCES = aes-decrypt-internal.c aes-decrypt.c \ md2.c md2-meta.c md4.c md4-meta.c \ md5.c md5-compress.c md5-compat.c md5-meta.c \ ripemd160.c ripemd160-compress.c ripemd160-meta.c \ + salsa20.c salsa20-meta.c \ sha1.c sha1-compress.c sha1-meta.c \ sha256.c sha256-compress.c sha224-meta.c sha256-meta.c \ sha512.c sha512-compress.c sha384-meta.c sha512-meta.c \ @@ -113,7 +114,7 @@ hogweed_SOURCES = sexp.c sexp-format.c \ pgp-encode.c rsa2openpgp.c \ der-iterator.c der2rsa.c der2dsa.c
-HEADERS = aes.h arcfour.h arctwo.h asn1.h bignum.h blowfish.h \ +HEADERS = aes.h arcfour.h salsa20.h arctwo.h asn1.h bignum.h blowfish.h \ base16.h base64.h buffer.h camellia.h cast128.h \ cbc.h ctr.h gcm.h \ des.h des-compat.h dsa.h \ diff --git a/examples/nettle-benchmark.c b/examples/nettle-benchmark.c index 424ef19..44fca4e 100644 --- a/examples/nettle-benchmark.c +++ b/examples/nettle-benchmark.c @@ -40,6 +40,7 @@
#include "aes.h" #include "arcfour.h" +#include "salsa20.h" #include "blowfish.h" #include "cast128.h" #include "cbc.h" @@ -612,6 +613,8 @@ main(int argc, char **argv) OPENSSL(&nettle_openssl_aes192) OPENSSL(&nettle_openssl_aes256) &nettle_arcfour128, OPENSSL(&nettle_openssl_arcfour128) + &nettle_salsa20_128, + &nettle_salsa20_256, &nettle_blowfish128, OPENSSL(&nettle_openssl_blowfish128) &nettle_camellia128, &nettle_camellia192, &nettle_camellia256, &nettle_cast128, OPENSSL(&nettle_openssl_cast128) diff --git a/nettle-meta-ciphers.c b/nettle-meta-ciphers.c index 1f07595..aae90f9 100644 --- a/nettle-meta-ciphers.c +++ b/nettle-meta-ciphers.c @@ -36,6 +36,8 @@ const struct nettle_cipher * const nettle_ciphers[] = { &nettle_camellia192, &nettle_camellia256, &nettle_cast128, + &nettle_salsa20_128, + &nettle_salsa20_256, &nettle_serpent128, &nettle_serpent192, &nettle_serpent256, diff --git a/nettle-meta.h b/nettle-meta.h index 54cbbd2..c8aabda 100644 --- a/nettle-meta.h +++ b/nettle-meta.h @@ -5,6 +5,7 @@
/* nettle, low-level cryptographics library * + * Copyright (C) 2012 Simon Josefsson * Copyright (C) 2002 Niels Möller * * The nettle library is free software; you can redistribute it and/or modify @@ -39,7 +40,7 @@ struct nettle_cipher
unsigned context_size;
- /* Zero for stream ciphers */ + /* Block size; 0 for byte-oriented stream ciphers. */ unsigned block_size;
/* Suggested key size; other sizes are sometimes possible. */ @@ -50,6 +51,8 @@ struct nettle_cipher
nettle_crypt_func *encrypt; nettle_crypt_func *decrypt; + + nettle_set_iv_func *set_iv; };
#define _NETTLE_CIPHER(name, NAME, key_size) { \ @@ -105,6 +108,9 @@ extern const struct nettle_cipher nettle_aes256;
extern const struct nettle_cipher nettle_arcfour128;
+extern const struct nettle_cipher nettle_salsa20_128; +extern const struct nettle_cipher nettle_salsa20_256; + extern const struct nettle_cipher nettle_camellia128; extern const struct nettle_cipher nettle_camellia192; extern const struct nettle_cipher nettle_camellia256; diff --git a/nettle-types.h b/nettle-types.h index b694332..e68cacb 100644 --- a/nettle-types.h +++ b/nettle-types.h @@ -2,6 +2,7 @@
/* nettle, low-level cryptographics library * + * Copyright (C) 2012 Simon Josefsson * Copyright (C) 2005 Niels Möller * * The nettle library is free software; you can redistribute it and/or modify @@ -44,6 +45,10 @@ typedef void nettle_set_key_func(void *ctx, unsigned length, const uint8_t *key);
+typedef void nettle_set_iv_func(void *ctx, + unsigned length, + const uint8_t *iv); + /* Uses a void * for cipher contexts.
For block ciphers it would make sense with a const void * for the diff --git a/salsa20-meta.c b/salsa20-meta.c new file mode 100644 index 0000000..8278e09 --- /dev/null +++ b/salsa20-meta.c @@ -0,0 +1,49 @@ +/* salsa20-meta.c */ + +/* nettle, low-level cryptographics library + * + * Copyright (C) 2012 Simon Josefsson + * + * The nettle library is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published by + * the Free Software Foundation; either version 2.1 of the License, or (at your + * option) any later version. + * + * The nettle library is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public + * License for more details. + * + * You should have received a copy of the GNU Lesser General Public License + * along with the nettle library; see the file COPYING.LIB. If not, write to + * the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, + * MA 02111-1307, USA. + */ + +#if HAVE_CONFIG_H +# include "config.h" +#endif + +#include "nettle-meta.h" + +#include "salsa20.h" + +const struct nettle_cipher nettle_salsa20_128 = + { "salsa20-128", sizeof(struct salsa20_ctx), + 8, 16, + (nettle_set_key_func *) salsa20_set_key, + (nettle_set_key_func *) salsa20_set_key, + (nettle_crypt_func *) salsa20_crypt, + (nettle_crypt_func *) salsa20_crypt, + (nettle_set_iv_func *) salsa20_set_iv + }; + +const struct nettle_cipher nettle_salsa20_256 = + { "salsa20-256", sizeof(struct salsa20_ctx), + 8, 32, + (nettle_set_key_func *) salsa20_set_key, + (nettle_set_key_func *) salsa20_set_key, + (nettle_crypt_func *) salsa20_crypt, + (nettle_crypt_func *) salsa20_crypt, + (nettle_set_iv_func *) salsa20_set_iv + }; diff --git a/salsa20.c b/salsa20.c new file mode 100644 index 0000000..7885199 --- /dev/null +++ b/salsa20.c @@ -0,0 +1,172 @@ +/* salsa20.c + * + * The Salsa20 stream cipher. + */ + +/* nettle, low-level cryptographics library + * + * Copyright (C) 2012 Simon Josefsson + * + * The nettle library is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published by + * the Free Software Foundation; either version 2.1 of the License, or (at your + * option) any later version. + * + * The nettle library is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public + * License for more details. + * + * You should have received a copy of the GNU Lesser General Public License + * along with the nettle library; see the file COPYING.LIB. If not, write to + * the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, + * MA 02111-1307, USA. + */ + +#if HAVE_CONFIG_H +# include "config.h" +#endif + +#include <assert.h> + +#include "salsa20.h" + +#define ROTL32(x,n) ((((x))<<(n)) | (((x))>>(32-(n)))) + +#define SWAP32(v) \ + ((ROTL32(v, 8) & 0x00FF00FFUL) | \ + (ROTL32(v, 24) & 0xFF00FF00UL)) + +#ifdef WORDS_BIGENDIAN +#define U32TO32_LITTLE(v) SWAP32(v) +#else +#define U32TO32_LITTLE(v) (v) +#endif + +#define U8TO32_LITTLE(p) U32TO32_LITTLE(((uint32_t*)(p))[0]) +#define U32TO8_LITTLE(p, v) (((uint32_t*)(p))[0] = U32TO32_LITTLE(v)) + +/* +salsa20-ref.c version 20051118 +D. J. Bernstein +Public domain. +*/ + +#define ROTATE(v,c) (ROTL32(v,c)) +#define XOR(v,w) ((v) ^ (w)) +#define PLUS(v,w) ((v) + (w)) +#define PLUSONE(v) (PLUS((v),1)) + +static void salsa20_wordtobyte(uint8_t output[64],const uint32_t input[16]) +{ + uint32_t x[16]; + int i; + + for (i = 0;i < 16;++i) x[i] = input[i]; + for (i = 20;i > 0;i -= 2) { + x[ 4] = XOR(x[ 4],ROTATE(PLUS(x[ 0],x[12]), 7)); + x[ 8] = XOR(x[ 8],ROTATE(PLUS(x[ 4],x[ 0]), 9)); + x[12] = XOR(x[12],ROTATE(PLUS(x[ 8],x[ 4]),13)); + x[ 0] = XOR(x[ 0],ROTATE(PLUS(x[12],x[ 8]),18)); + x[ 9] = XOR(x[ 9],ROTATE(PLUS(x[ 5],x[ 1]), 7)); + x[13] = XOR(x[13],ROTATE(PLUS(x[ 9],x[ 5]), 9)); + x[ 1] = XOR(x[ 1],ROTATE(PLUS(x[13],x[ 9]),13)); + x[ 5] = XOR(x[ 5],ROTATE(PLUS(x[ 1],x[13]),18)); + x[14] = XOR(x[14],ROTATE(PLUS(x[10],x[ 6]), 7)); + x[ 2] = XOR(x[ 2],ROTATE(PLUS(x[14],x[10]), 9)); + x[ 6] = XOR(x[ 6],ROTATE(PLUS(x[ 2],x[14]),13)); + x[10] = XOR(x[10],ROTATE(PLUS(x[ 6],x[ 2]),18)); + x[ 3] = XOR(x[ 3],ROTATE(PLUS(x[15],x[11]), 7)); + x[ 7] = XOR(x[ 7],ROTATE(PLUS(x[ 3],x[15]), 9)); + x[11] = XOR(x[11],ROTATE(PLUS(x[ 7],x[ 3]),13)); + x[15] = XOR(x[15],ROTATE(PLUS(x[11],x[ 7]),18)); + x[ 1] = XOR(x[ 1],ROTATE(PLUS(x[ 0],x[ 3]), 7)); + x[ 2] = XOR(x[ 2],ROTATE(PLUS(x[ 1],x[ 0]), 9)); + x[ 3] = XOR(x[ 3],ROTATE(PLUS(x[ 2],x[ 1]),13)); + x[ 0] = XOR(x[ 0],ROTATE(PLUS(x[ 3],x[ 2]),18)); + x[ 6] = XOR(x[ 6],ROTATE(PLUS(x[ 5],x[ 4]), 7)); + x[ 7] = XOR(x[ 7],ROTATE(PLUS(x[ 6],x[ 5]), 9)); + x[ 4] = XOR(x[ 4],ROTATE(PLUS(x[ 7],x[ 6]),13)); + x[ 5] = XOR(x[ 5],ROTATE(PLUS(x[ 4],x[ 7]),18)); + x[11] = XOR(x[11],ROTATE(PLUS(x[10],x[ 9]), 7)); + x[ 8] = XOR(x[ 8],ROTATE(PLUS(x[11],x[10]), 9)); + x[ 9] = XOR(x[ 9],ROTATE(PLUS(x[ 8],x[11]),13)); + x[10] = XOR(x[10],ROTATE(PLUS(x[ 9],x[ 8]),18)); + x[12] = XOR(x[12],ROTATE(PLUS(x[15],x[14]), 7)); + x[13] = XOR(x[13],ROTATE(PLUS(x[12],x[15]), 9)); + x[14] = XOR(x[14],ROTATE(PLUS(x[13],x[12]),13)); + x[15] = XOR(x[15],ROTATE(PLUS(x[14],x[13]),18)); + } + for (i = 0;i < 16;++i) x[i] = PLUS(x[i],input[i]); + for (i = 0;i < 16;++i) U32TO8_LITTLE(output + 4 * i,x[i]); +} + +static const char sigma[16] = "expand 32-byte k"; +static const char tau[16] = "expand 16-byte k"; + +void +salsa20_set_key(struct salsa20_ctx *ctx, + unsigned length, const uint8_t *key) +{ + const char *constants; + + assert (length == SALSA20_MIN_KEY_SIZE || length == SALSA20_MAX_KEY_SIZE); + + ctx->input[1] = U8TO32_LITTLE(key + 0); + ctx->input[2] = U8TO32_LITTLE(key + 4); + ctx->input[3] = U8TO32_LITTLE(key + 8); + ctx->input[4] = U8TO32_LITTLE(key + 12); + if (length == SALSA20_MAX_KEY_SIZE) { /* recommended */ + key += 16; + constants = sigma; + } else { /* kbits == 128 */ + constants = tau; + } + ctx->input[11] = U8TO32_LITTLE(key + 0); + ctx->input[12] = U8TO32_LITTLE(key + 4); + ctx->input[13] = U8TO32_LITTLE(key + 8); + ctx->input[14] = U8TO32_LITTLE(key + 12); + ctx->input[0] = U8TO32_LITTLE(constants + 0); + ctx->input[5] = U8TO32_LITTLE(constants + 4); + ctx->input[10] = U8TO32_LITTLE(constants + 8); + ctx->input[15] = U8TO32_LITTLE(constants + 12); +} + +void +salsa20_set_iv(struct salsa20_ctx *ctx, unsigned length, const uint8_t *iv) +{ + assert (length == SALSA20_IV_SIZE); + + ctx->input[6] = U8TO32_LITTLE(iv + 0); + ctx->input[7] = U8TO32_LITTLE(iv + 4); + ctx->input[8] = 0; + ctx->input[9] = 0; +} + +void +salsa20_crypt(struct salsa20_ctx *ctx, + unsigned length, + uint8_t *c, + const uint8_t *m) +{ + uint8_t output[64]; + unsigned i; + + if (!length) return; + for (;;) { + salsa20_wordtobyte(output,ctx->input); + ctx->input[8] = PLUSONE(ctx->input[8]); + if (!ctx->input[8]) { + ctx->input[9] = PLUSONE(ctx->input[9]); + /* stopping at 2^70 length per nonce is user's responsibility */ + } + if (length <= 64) { + for (i = 0;i < length;++i) c[i] = m[i] ^ output[i]; + return; + } + for (i = 0;i < 64;++i) c[i] = m[i] ^ output[i]; + length -= 64; + c += 64; + m += 64; + } +} diff --git a/salsa20.h b/salsa20.h new file mode 100644 index 0000000..79f1505 --- /dev/null +++ b/salsa20.h @@ -0,0 +1,71 @@ +/* salsa20.h + * + * The Salsa20 stream cipher. + */ + +/* nettle, low-level cryptographics library + * + * Copyright (C) 2012 Simon Josefsson + * Copyright (C) 2001 Niels Möller + * + * The nettle library is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published by + * the Free Software Foundation; either version 2.1 of the License, or (at your + * option) any later version. + * + * The nettle library is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public + * License for more details. + * + * You should have received a copy of the GNU Lesser General Public License + * along with the nettle library; see the file COPYING.LIB. If not, write to + * the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, + * MA 02111-1307, USA. + */ + +#ifndef NETTLE_SALSA20_H_INCLUDED +#define NETTLE_SALSA20_H_INCLUDED + +#include "nettle-types.h" + +#ifdef __cplusplus +extern "C" { +#endif + +/* Name mangling */ +#define salsa20_set_key nettle_salsa20_set_key +#define salsa20_set_iv nettle_salsa20_set_iv +#define salsa20_crypt nettle_salsa20_crypt + +/* Minimum and maximum keysizes, and a reasonable default. In + * octets.*/ +#define SALSA20_MIN_KEY_SIZE 16 +#define SALSA20_MAX_KEY_SIZE 32 +#define SALSA20_KEY_SIZE 32 + +#define SALSA20_IV_SIZE 8 + +struct salsa20_ctx +{ + uint32_t input[16]; +}; + +void +salsa20_set_key(struct salsa20_ctx *ctx, + unsigned length, const uint8_t *key); + +void +salsa20_set_iv(struct salsa20_ctx *ctx, + unsigned length, const uint8_t *iv); + +void +salsa20_crypt(struct salsa20_ctx *ctx, + unsigned length, uint8_t *dst, + const uint8_t *src); + +#ifdef __cplusplus +} +#endif + +#endif /* NETTLE_SALSA20_H_INCLUDED */ diff --git a/testsuite/.gitignore b/testsuite/.gitignore index 1147cfb..c9f4698 100644 --- a/testsuite/.gitignore +++ b/testsuite/.gitignore @@ -36,6 +36,7 @@ /rsa-keygen-test /rsa-test /rsa2sexp-test +/salsa20-test /serpent-test /sexp-format-test /sexp-test diff --git a/testsuite/.test-rules.make b/testsuite/.test-rules.make index e8f6650..10c993f 100644 --- a/testsuite/.test-rules.make +++ b/testsuite/.test-rules.make @@ -49,6 +49,9 @@ memxor-test$(EXEEXT): memxor-test.$(OBJEXT) ripemd160-test$(EXEEXT): ripemd160-test.$(OBJEXT) $(LINK) ripemd160-test.$(OBJEXT) $(TEST_OBJS) -o ripemd160-test$(EXEEXT)
+salsa20-test$(EXEEXT): salsa20-test.$(OBJEXT) + $(LINK) salsa20-test.$(OBJEXT) $(TEST_OBJS) -o salsa20-test$(EXEEXT) + sha1-test$(EXEEXT): sha1-test.$(OBJEXT) $(LINK) sha1-test.$(OBJEXT) $(TEST_OBJS) -o sha1-test$(EXEEXT)
diff --git a/testsuite/Makefile.in b/testsuite/Makefile.in index 8dfc62b..10ffae9 100644 --- a/testsuite/Makefile.in +++ b/testsuite/Makefile.in @@ -18,6 +18,7 @@ TS_NETTLE_SOURCES = aes-test.c arcfour-test.c arctwo-test.c \ md2-test.c md4-test.c md5-test.c md5-compat-test.c \ memxor-test.c \ ripemd160-test.c \ + salsa20-test.c \ sha1-test.c sha224-test.c sha256-test.c \ sha384-test.c sha512-test.c \ serpent-test.c twofish-test.c \ diff --git a/testsuite/meta-cipher-test.c b/testsuite/meta-cipher-test.c index 1bb74d8..50a02ba 100644 --- a/testsuite/meta-cipher-test.c +++ b/testsuite/meta-cipher-test.c @@ -14,6 +14,8 @@ const char* ciphers[] = { "camellia192", "camellia256", "cast128", + "salsa20-128", + "salsa20-256", "serpent128", "serpent192", "serpent256", diff --git a/testsuite/salsa20-test.c b/testsuite/salsa20-test.c new file mode 100644 index 0000000..e1c13dd --- /dev/null +++ b/testsuite/salsa20-test.c @@ -0,0 +1,49 @@ +#include "testutils.h" +#include "salsa20.h" + +int +test_main(void) +{ + /* http://www.ecrypt.eu.org/stream/svn/viewcvs.cgi/ecrypt/trunk/submissions/sal... */ + + test_cipher_iv(&nettle_salsa20_128, + HL("80000000 00000000 00000000 00000000"), + HL("00000000 00000000"), + HL("00000000 00000000"), + H("4DFA5E48 1DA23EA0")); + + test_cipher_iv(&nettle_salsa20_128, + HL("00000000 00000000 00000000 00000000"), + HL("80000000 00000000"), + HL("00000000 00000000"), + H("B66C1E44 46DD9557")); + + test_cipher_iv(&nettle_salsa20_128, + HL("0053A6F94C9FF24598EB3E91E4378ADD"), + HL("0D74DB42A91077DE"), + HL("00000000 00000000"), + H("05E1E7BE B697D999")); + + test_cipher_iv(&nettle_salsa20_256, + HL("80000000 00000000 00000000 00000000" + "00000000 00000000 00000000 00000000"), + HL("00000000 00000000"), + HL("00000000 00000000"), + H("E3BE8FDD 8BECA2E3")); + + test_cipher_iv(&nettle_salsa20_256, + HL("00000000 00000000 00000000 00000000" + "00000000 00000000 00000000 00000000"), + HL("80000000 00000000"), + HL("00000000 00000000"), + H("2ABA3DC45B494700")); + + test_cipher_iv(&nettle_salsa20_256, + HL("0053A6F94C9FF24598EB3E91E4378ADD" + "3083D6297CCF2275C81B6EC11467BA0D"), + HL("0D74DB42A91077DE"), + HL("00000000 00000000"), + H("F5FAD53F 79F9DF58")); + + SUCCESS(); +} diff --git a/testsuite/testutils.c b/testsuite/testutils.c index d77bb7e..8f5dab0 100644 --- a/testsuite/testutils.c +++ b/testsuite/testutils.c @@ -162,17 +162,21 @@ main(int argc, char **argv) }
void -test_cipher(const struct nettle_cipher *cipher, - unsigned key_length, - const uint8_t *key, - unsigned length, - const uint8_t *cleartext, - const uint8_t *ciphertext) +test_cipher_iv(const struct nettle_cipher *cipher, + unsigned key_length, + const uint8_t *key, + unsigned iv_length, + const uint8_t *iv, + unsigned length, + const uint8_t *cleartext, + const uint8_t *ciphertext) { void *ctx = xalloc(cipher->context_size); uint8_t *data = xalloc(length);
cipher->set_encrypt_key(ctx, key_length, key); + if (iv_length > 0 && iv) + cipher->set_iv(ctx, iv_length, iv); cipher->encrypt(ctx, length, data, cleartext);
if (!MEMEQ(length, data, ciphertext)) @@ -187,6 +191,8 @@ test_cipher(const struct nettle_cipher *cipher, FAIL(); } cipher->set_decrypt_key(ctx, key_length, key); + if (iv_length > 0 && iv) + cipher->set_iv(ctx, iv_length, iv); cipher->decrypt(ctx, length, data, data);
if (!MEMEQ(length, data, cleartext)) @@ -206,6 +212,24 @@ test_cipher(const struct nettle_cipher *cipher, }
void +test_cipher(const struct nettle_cipher *cipher, + unsigned key_length, + const uint8_t *key, + unsigned length, + const uint8_t *cleartext, + const uint8_t *ciphertext) +{ + test_cipher_iv(cipher, + key_length, + key, + 0, + NULL, + length, + cleartext, + ciphertext); +} + +void test_cipher_cbc(const struct nettle_cipher *cipher, unsigned key_length, const uint8_t *key, diff --git a/testsuite/testutils.h b/testsuite/testutils.h index d7ced9a..9d13dff 100644 --- a/testsuite/testutils.h +++ b/testsuite/testutils.h @@ -91,6 +91,16 @@ test_cipher(const struct nettle_cipher *cipher, const uint8_t *ciphertext);
void +test_cipher_iv(const struct nettle_cipher *cipher, + unsigned key_length, + const uint8_t *key, + unsigned iv_length, + const uint8_t *iv, + unsigned length, + const uint8_t *cleartext, + const uint8_t *ciphertext); + +void test_cipher_cbc(const struct nettle_cipher *cipher, unsigned key_length, const uint8_t *key,
Simon Josefsson simon@josefsson.org writes:
Hi! Please find attached a port of DJB's public domain code for Salsa20 to nettle. The patch is not meant to be applied as-is but to start a discussion.
I'm not familiar with this cipher... Could you explain briefly why one may want to use it?
While writing this I noticed a sub-optimal design of Nettle, in that it overloads the meaning of struct nettle_cipher->block_size as a flag of whether the cipher is a stream cipher or not. Before arcfour was the only stream cipher in nettle, and using block_size=0 to signal that it is a stream cipher worked fine. However, Salsa20 has a block size of 8 bytes.
If Salsa20 doesn't fit in the nettle_cipher abstraction, then I think we should start by implementing Salsa20 *without* trying to get it to fit there. To me, a "stream cipher" with a block size and iv seems a bit like a contradiction.
In general, I don't think it makes much sense to include algorithms which behave very differently under the "nettle_cipher" umbrella, since the point of that abstraction is to collect algorithms which are easily exchangable. Code accepting a nettle_cipher are not expected to check flags or behave fundamentally different for different algorithms.
Including the arcfour stream cipher there may well be a mistake. One annoying problem with the current design is that for block ciphers, the ctx argument of the encrypt and decrypt functions naturally is const, but due arcfour being fittted in the same nettle_cipher abstraction, the function typedef nettle_crypt_func can't use a const ctx.
Anyway, I'd prefer to treat redesign in that area as an independent problem from adding support for Salsa20.
Then there's also the tentative (nettle-internal.h) nettle_aead abstraction. Salsa20 could perhaps fit there, if we allow algorithms with no authentication (NULL digest function pointer).
What do you think? I'd be happy to propose a patch to implement 2) and then a follow-on patch to add support for Salsa20.
I'd prefer the opposite order...
Some quick comments on the code and algorithm below. This is the first I hear about it, so I may well be misunderstanding how it is supposed to work.
--- /dev/null +++ b/salsa20.c +#define ROTL32(x,n) ((((x))<<(n)) | (((x))>>(32-(n))))
There are several different variants of that macro. It would be nice with a unified one in macros.h.
+#define SWAP32(v) \
- ((ROTL32(v, 8) & 0x00FF00FFUL) | \
- (ROTL32(v, 24) & 0xFF00FF00UL))
+#ifdef WORDS_BIGENDIAN +#define U32TO32_LITTLE(v) SWAP32(v) +#else +#define U32TO32_LITTLE(v) (v) +#endif
+#define U8TO32_LITTLE(p) U32TO32_LITTLE(((uint32_t*)(p))[0]) +#define U32TO8_LITTLE(p, v) (((uint32_t*)(p))[0] = U32TO32_LITTLE(v))
That's a clever byte swapping trick (at least if a true rot instruction is available). In Nettle conversion between bytes and integers are usually done with READ_UINT32, LE_READ_UINT32 and friends. It's usually not very performance critical, and it deals naturally with any alignment. I suspect U8TO32_LITTLE above breaks if the input is unaligned but the architecture doesn't allow unaligned word reads.
+/* +salsa20-ref.c version 20051118 +D. J. Bernstein +Public domain. +*/
Shouldn't that be moved up closer to the (C) header?
+#define ROTATE(v,c) (ROTL32(v,c)) +#define XOR(v,w) ((v) ^ (w)) +#define PLUS(v,w) ((v) + (w)) +#define PLUSONE(v) (PLUS((v),1))
I'm not sure these macros improve readability...
+static void salsa20_wordtobyte(uint8_t output[64],const uint32_t input[16])
Despite the name, this is the main encryption function. Or rather, a (hopefully) one-way function? "input" is really the context struct, which contains some mix of the key and the iv?
- x[ 4] = XOR(x[ 4],ROTATE(PLUS(x[ 0],x[12]), 7));
And what's wrong with writing that as
x[ 4] ^= ROTATE(x[ 0] + x[12], 7)
+void +salsa20_set_key(struct salsa20_ctx *ctx,
unsigned length, const uint8_t *key)
+{
Ok, this initializes the input arrray at indices 0-5, and 10-15.
+void +salsa20_set_iv(struct salsa20_ctx *ctx, unsigned length, const uint8_t *iv) +{
- assert (length == SALSA20_IV_SIZE);
And this initializes indices 6-9. I'd remove the length argument, if only one value is allowed anyway.
+void +salsa20_crypt(struct salsa20_ctx *ctx,
unsigned length,
uint8_t *c,
const uint8_t *m)
+{
- uint8_t output[64];
- unsigned i;
- if (!length) return;
- for (;;) {
- salsa20_wordtobyte(output,ctx->input);
That's applying a oneway function to the input,
- ctx->input[8] = PLUSONE(ctx->input[8]);
- if (!ctx->input[8]) {
ctx->input[9] = PLUSONE(ctx->input[9]);
/* stopping at 2^70 length per nonce is user's responsibility */
- }
followed by an increment of the iv, where the second half appearantly is used as a counter.
- if (length <= 64) {
for (i = 0;i < length;++i) c[i] = m[i] ^ output[i];
return;
- }
- for (i = 0;i < 64;++i) c[i] = m[i] ^ output[i];
Followed by xoring the generated key stream to get the ciphertext.
Do I understand it correctly that there's nothing really non-linear anywhere in this cipher? Only a mix of + and ^ which both are linear, but over different rings. So the security aginst a known plaintext attack would rest on the carry propagation, together with the large state space of 512 bits (448 bits of expanded key material, and 64 bits counter).
Regards, /Niels
nisse@lysator.liu.se (Niels Möller) writes:
I'm not familiar with this cipher... Could you explain briefly why one may want to use it?
I've done some more homework, reading some of djb's papers and the wikipedia page. Some questions:
1. Any protocols that specify use of salsa20? And in combination with which MAC? With claimed cycle numbers, hmac with sha1 or sha2 is going to take much more time than the encryption.
2. What about the later "ChaCha" cipher? As far as I see, it has not been studied as much as Salsa20.
Regards, /Niels
Hi! Sorry for not following up on the Salsa20 thread, other things needed my attention... primarily my reason for saying this shouldn't be committed directly was that it used the first reference implementation from:
and that page contains other implementations as well, which may be faster. However speed can always be improved later on, I suppose the important aspect is to get the API right from the start.
I'm replying to all your replies in this e-mail, see below.
nisse@lysator.liu.se (Niels Möller) writes:
Simon Josefsson simon@josefsson.org writes:
Hi! Please find attached a port of DJB's public domain code for Salsa20 to nettle. The patch is not meant to be applied as-is but to start a discussion.
I'm not familiar with this cipher... Could you explain briefly why one may want to use it?
I don't know of any reason today, e.g., no important Internet applications uses it. It has some interesting properties though, so maybe there is uptake.
While writing this I noticed a sub-optimal design of Nettle, in that it overloads the meaning of struct nettle_cipher->block_size as a flag of whether the cipher is a stream cipher or not. Before arcfour was the only stream cipher in nettle, and using block_size=0 to signal that it is a stream cipher worked fine. However, Salsa20 has a block size of 8 bytes.
If Salsa20 doesn't fit in the nettle_cipher abstraction, then I think we should start by implementing Salsa20 *without* trying to get it to fit there. To me, a "stream cipher" with a block size and iv seems a bit like a contradiction.
Somewhat, although I think the distinction between stream ciphers and block ciphers is not a good one to make in an API. Usually applications want to use ciphers in some mode, and some modes used with block ciphers make them essentially stream ciphers. I think an good API should be modeled around "symmetric encryption" as the concept. Possibly just "encryption", but I think public-key and symmetric encryption are such different beasts that they shouldn't be exposed through the same API.
In general, I don't think it makes much sense to include algorithms which behave very differently under the "nettle_cipher" umbrella, since the point of that abstraction is to collect algorithms which are easily exchangable. Code accepting a nettle_cipher are not expected to check flags or behave fundamentally different for different algorithms.
Sure.
Including the arcfour stream cipher there may well be a mistake. One annoying problem with the current design is that for block ciphers, the ctx argument of the encrypt and decrypt functions naturally is const, but due arcfour being fittted in the same nettle_cipher abstraction, the function typedef nettle_crypt_func can't use a const ctx.
I always found rc4 the odd cipher in nettle.
Anyway, I'd prefer to treat redesign in that area as an independent problem from adding support for Salsa20.
That is a good idea.
Then there's also the tentative (nettle-internal.h) nettle_aead abstraction. Salsa20 could perhaps fit there, if we allow algorithms with no authentication (NULL digest function pointer).
Possibly... or just have one abstract "symmetric encryption" that embodies all these variants. Or does that lead to other disadvantages?
--- /dev/null +++ b/salsa20.c +#define ROTL32(x,n) ((((x))<<(n)) | (((x))>>(32-(n))))
There are several different variants of that macro. It would be nice with a unified one in macros.h.
I agree.
+#define SWAP32(v) \
- ((ROTL32(v, 8) & 0x00FF00FFUL) | \
- (ROTL32(v, 24) & 0xFF00FF00UL))
+#ifdef WORDS_BIGENDIAN +#define U32TO32_LITTLE(v) SWAP32(v) +#else +#define U32TO32_LITTLE(v) (v) +#endif
+#define U8TO32_LITTLE(p) U32TO32_LITTLE(((uint32_t*)(p))[0]) +#define U32TO8_LITTLE(p, v) (((uint32_t*)(p))[0] = U32TO32_LITTLE(v))
That's a clever byte swapping trick (at least if a true rot instruction is available). In Nettle conversion between bytes and integers are usually done with READ_UINT32, LE_READ_UINT32 and friends. It's usually not very performance critical, and it deals naturally with any alignment. I suspect U8TO32_LITTLE above breaks if the input is unaligned but the architecture doesn't allow unaligned word reads.
Using READ_UINT32 etc is probably better.
+/* +salsa20-ref.c version 20051118 +D. J. Bernstein +Public domain. +*/
Shouldn't that be moved up closer to the (C) header?
I put the majority of my changes above that header, to make it easy to sync and compare the code
+#define ROTATE(v,c) (ROTL32(v,c)) +#define XOR(v,w) ((v) ^ (w)) +#define PLUS(v,w) ((v) + (w)) +#define PLUSONE(v) (PLUS((v),1))
I'm not sure these macros improve readability...
They are from the reference code.
+static void salsa20_wordtobyte(uint8_t output[64],const uint32_t input[16])
Despite the name, this is the main encryption function. Or rather, a (hopefully) one-way function? "input" is really the context struct, which contains some mix of the key and the iv?
I must admit I haven't looked into the details of the code.
- x[ 4] = XOR(x[ 4],ROTATE(PLUS(x[ 0],x[12]), 7));
And what's wrong with writing that as
x[ 4] ^= ROTATE(x[ 0] + x[12], 7)
That's clearer to me.
+void +salsa20_set_iv(struct salsa20_ctx *ctx, unsigned length, const uint8_t *iv) +{
- assert (length == SALSA20_IV_SIZE);
And this initializes indices 6-9. I'd remove the length argument, if only one value is allowed anyway.
I added it only for compatibility with the internal nettle interface.
I've done some more homework, reading some of djb's papers and the wikipedia page. Some questions:
- Any protocols that specify use of salsa20? And in combination with which MAC? With claimed cycle numbers, hmac with sha1 or sha2 is going to take much more time than the encryption.
I don't know of any protocols right now.
- What about the later "ChaCha" cipher? As far as I see, it has not been studied as much as Salsa20.
Yeah, I looked at it too, but preferred something that people seems more familiar with.
A bit of cleanup of the code.
Try optimization of the C code for 64-bit machines. One ought to be able to do two column or row operations in parallel by putting two salsa20 words into a uint64_t variable. May need some tricks to avoid carry propagation between the words, but I suspect it may be a win due to lower register pressure. A bit similar to the HAVE_NATIVE_64_BIT in camellia-crypt-internal.c.
Try an sse2 assembly implementation (the djb:s papers outline how to do that). Or copy some existing implementation.
Take a look at the link above, most likely there exists something. I'm not sure how important it is though.
- Documentation.
Yes.
As for the salsa20_* interface:
- It would be possible to change the interface to not expose the block size, doing a little buffering instead. But I think it's better to not do that, and follow what the ctr code and gcm code does.
I agree.
- One advertised feature of the cipher is random access. I think we should have something like a salsa20_set_pos, taking a block count as argument.
Yes.
I also arranged in the Makefile so that nettle-internal.o no longer is included in the nettle library. Will that cause any trouble for anybody?
Didn't GnuTLS use some thought-to-be-internal-functions-but-apparently-not? Maybe it was just the macros.h header file though.
Do any of you know of any protocols which specify use of salsa20? Is it usually combined with some *fast* MAC algorithm?
I suspect people who like Salsa are inclined to also like CubeHash, which could be used in HMAC variants. CubeHash is fast with optimistic parameters, but the "default" is pretty conservative making it not so fast.
/Simon
Simon Josefsson simon@josefsson.org writes:
I don't know of any reason today, e.g., no important Internet applications uses it. It has some interesting properties though, so maybe there is uptake.
It seems the main selling point is speed. Internally, it seems a bit unorthodox,
* hash function in counter mode (when everybody seems to move to block ciphers in counter mode, the invertibility of the underlying block cipher is an irrelevant feature).
* No sboxes or other "large" non-linearities. Just mix of xor and mod 2^32 add, and ten rotations to speed up diffusion. Everybody else seems to think non-linear sboxes (or things like the majority function in sha hashes) are essential. Possibly with the exception of idea? Which uses similar primitives, iirc, but with mod (2^16 + 1) multiplies instead of simple rotations to help the bit diffusion.
Somewhat, although I think the distinction between stream ciphers and block ciphers is not a good one to make in an API. Usually applications want to use ciphers in some mode, and some modes used with block ciphers make them essentially stream ciphers. I think an good API should be modeled around "symmetric encryption" as the concept.
Such an API would make some sense, but I don't think it's a substitute for a block cipher API. E.g., in ssh, the application has to be aware of the block size, and the message padding is application specific (isn't it the same in tls?). An API for encrypting a stream would not be of much help here. And when using cbc, I think it's not even possible.
Including the arcfour stream cipher there may well be a mistake. One annoying problem with the current design is that for block ciphers, the ctx argument of the encrypt and decrypt functions naturally is const, but due arcfour being fittted in the same nettle_cipher abstraction, the function typedef nettle_crypt_func can't use a const ctx.
I always found rc4 the odd cipher in nettle.
That the nettle_cipher abstraction treats the stream ciphers (arcfour being the only supported one) as a block cipher with block size zero goes back to my and Henrik Grubbström's design for Pike's crypto library, some 15 years ago...
Is that what you're finding odd, or is there anything else which is strange with arcfour?
Then there's also the tentative (nettle-internal.h) nettle_aead abstraction. Salsa20 could perhaps fit there, if we allow algorithms with no authentication (NULL digest function pointer).
Possibly... or just have one abstract "symmetric encryption" that embodies all these variants. Or does that lead to other disadvantages?
In general, it's not much point to have a general interface, if the application has to query particulars before using it (e.g, does this mechanism provide any authentication, or do I need to combine it with some other MAC?). So I don't think what I sketched is a good idea.
--- /dev/null +++ b/salsa20.c +#define ROTL32(x,n) ((((x))<<(n)) | (((x))>>(32-(n))))
There are several different variants of that macro. It would be nice with a unified one in macros.h.
I agree.
Done.
+#define SWAP32(v) \
- ((ROTL32(v, 8) & 0x00FF00FFUL) | \
- (ROTL32(v, 24) & 0xFF00FF00UL))
[...]
That's a clever byte swapping trick (at least if a true rot instruction is available). In Nettle conversion between bytes and integers are usually done with READ_UINT32, LE_READ_UINT32 and friends. It's usually not very performance critical, and it deals naturally with any alignment. I suspect U8TO32_LITTLE above breaks if the input is unaligned but the architecture doesn't allow unaligned word reads.
Using READ_UINT32 etc is probably better.
I ended up keeping that kind of byte swapping, to be able to stick to word accesses to memory until the final plaintext/cryptotext xor.
I put the majority of my changes above that header, to make it easy to sync and compare the code
I'm afraid that not easy any more... I made quite a lot of changes.
- Try an sse2 assembly implementation (the djb:s papers outline how to do that). Or copy some existing implementation.
Take a look at the link above, most likely there exists something. I'm not sure how important it is though.
I had a look, but I found that assembly code hard to read (apparantly automatically generated by some tool of djb's).
- One advertised feature of the cipher is random access. I think we should have something like a salsa20_set_pos, taking a block count as argument.
Yes.
Suggestions for name?
I think set_iv should still set the count to zero, so you need to use the new function only if you want to do seeeks in the data.
Do any of you know of any protocols which specify use of salsa20? Is it usually combined with some *fast* MAC algorithm?
I suspect people who like Salsa are inclined to also like CubeHash, which could be used in HMAC variants. CubeHash is fast with optimistic parameters, but the "default" is pretty conservative making it not so fast.
Maybe it's possible to do something in the style of ofb, to get authentication cheaply as a side effect (haven't looked very closely at ofb, though, and ofb itself seems to be unusable in Nettle due to patents). Maybe I should mail djb and ask.
Regards, /Niels
Simon Josefsson simon@josefsson.org writes:
Please find attached a port of DJB's public domain code for Salsa20 to nettle. The patch is not meant to be applied as-is but to start a discussion.
Nevertheless, I have now commited salsa20.[ch] and the testcase, postponing remaining work for later. ;-)
Needed:
* Support in nettle-benchmark in some way (can probably get away with a stupid nettle_cipher setting the iv to some fixed value).
* A bit of cleanup of the code.
* Try optimization of the C code for 64-bit machines. One ought to be able to do two column or row operations in parallel by putting two salsa20 words into a uint64_t variable. May need some tricks to avoid carry propagation between the words, but I suspect it may be a win due to lower register pressure. A bit similar to the HAVE_NATIVE_64_BIT in camellia-crypt-internal.c.
* Try an sse2 assembly implementation (the djb:s papers outline how to do that). Or copy some existing implementation.
* Documentation.
As for the salsa20_* interface:
* It would be possible to change the interface to not expose the block size, doing a little buffering instead. But I think it's better to not do that, and follow what the ctr code and gcm code does.
* One advertised feature of the cipher is random access. I think we should have something like a salsa20_set_pos, taking a block count as argument.
Regards, /Niels
And now I've added support for salsa20 benchmarking as well. Performance is pretty good, even for this reference implementation. Seems to run at 12 cycles per byte on my laptop, a little faster than 128-bit aes.
To get that working, I added a very kludgy
const struct nettle_cipher nettle_salsa20 = {...};
to nettle-internal.c, for benchmarking only.
I also arranged in the Makefile so that nettle-internal.o no longer is included in the nettle library. Will that cause any trouble for anybody?
Regards, /Niels
nisse@lysator.liu.se (Niels Möller) writes:
And now I've added support for salsa20 benchmarking as well. Performance is pretty good, even for this reference implementation. Seems to run at 12 cycles per byte on my laptop, a little faster than 128-bit aes.
Now I've done some cleanup and reorganization work. Using memxor3 for the actual xoring saves almost 3 cycles per byte, so it's a considerable speedup in this case (and then memxor3 can maybe be sped up a bit more on this platform, from Nikos' work a while ago).
I wonder if there's any potential for simd-style parallel processing using plain 64-bit operations (no sse2 assembly). If one puts two 32-bit values in a 64-bit variable, parallel xor is trivial, but we also need 32-bit rotations and adds.
Parallel rotations are possible as two shifts and some extra masking, so it might be slightly faster than two 32-bit rotations of two shifts each.
For parallel mod 2^32 additions, one need to cancel one of the carries. Something like
uint64_t x, y, z;
z = x + y; z += (- (uint64_t) ((uint32_t) z < (uint32_t) y)) << 32;
This can hardly be faster than two 32-bit additions (two lea, cmp, cmov on x86_64, or alternatively, one lea, cmp, sbb, shl, add, depending on code genereration).
Any speedup depends on the relative speedup or slowdown of each primitive, the cost of permuting values as needed, and then the effects of reduced register pressure on top of that.
For x86_64 (and maybe x86_32), assembly implementation also seems attractive, but I'd like to do any general or 64-bit specific optimizations in the C code first.
Regards, /Niels
nisse@lysator.liu.se (Niels Möller) writes:
For x86_64 (and maybe x86_32), assembly implementation also seems attractive,
I just push an x86_64 implementation of salsa20_crypt. Runs at 6.6 cycles/byte on my laptop (or 189 Mbyte/s), which is more than twice the speed of aes128, and slightly faster than arcfour.
It's likely possible to squeeze out a cycle or two more, by doing two blocks in parallel (I think djb's x86_64 code does that, but I found it very hard to read), or by other micro-optimizations.
Do any of you know of any protocols which specify use of salsa20? Is it usually combined with some *fast* MAC algorithm?
Regards, /Niels
nettle-bugs@lists.lysator.liu.se