nisse@lysator.liu.se (Niels Möller) writes:
After some more thinking, I think this is the way to go. I'd like to propose the following plan:
- Do a _salsa20_core, working with uint32_t. Consider it an internal function, and keep the interface open (maybe it should be able to do several blocks, maybe it should byteswap output words, etc).
[...]
- Maybe have the C implementation salsa20_crypto also call _salsa20_core to do the main work. May require byteswapping in _salsa20_core output, to avoid a performance regression.
I tried extracting a _salsa20_core from salsa20_crypt. See patch below. The function byteswaps the output words (if needed, i.e., on bigendian machines).
On my machine (lowend, AMD E-350), the performance penalty is 4%, 12.25 cycles/byte before the change, 12.75 cycles/byte after. I think that's ok if it can be shared with salsa20_core and scrypt.
(And on this machine there also appears to be no significant gain from the current assembly implementation).
Regards, /Niels
diff --git a/Makefile.in b/Makefile.in index 9904be5..c0ca3ad 100644 --- a/Makefile.in +++ b/Makefile.in @@ -82,6 +82,7 @@ nettle_SOURCES = aes-decrypt-internal.c aes-decrypt.c \ md2.c md2-meta.c md4.c md4-meta.c \ md5.c md5-compress.c md5-compat.c md5-meta.c \ ripemd160.c ripemd160-compress.c ripemd160-meta.c \ + salsa20-core-internal.c \ salsa20-crypt.c salsa20-set-key.c \ sha1.c sha1-compress.c sha1-meta.c \ sha256.c sha256-compress.c sha224-meta.c sha256-meta.c \ diff --git a/salsa20-core-internal.c b/salsa20-core-internal.c new file mode 100644 index 0000000..2c1ae3c --- /dev/null +++ b/salsa20-core-internal.c @@ -0,0 +1,85 @@ +/* salsa20-core-internal.c + * + * Internal interface to the Salsa20 core function. + */ + +/* nettle, low-level cryptographics library + * + * Copyright (C) 2012 Simon Josefsson, Niels Möller + * + * The nettle library is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published by + * the Free Software Foundation; either version 2.1 of the License, or (at your + * option) any later version. + * + * The nettle library is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public + * License for more details. + * + * You should have received a copy of the GNU Lesser General Public License + * along with the nettle library; see the file COPYING.LIB. If not, write to + * the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, + * MA 02111-1301, USA. + */ + +/* Based on: + salsa20-ref.c version 20051118 + D. J. Bernstein + Public domain. +*/ + +#if HAVE_CONFIG_H +# include "config.h" +#endif + +#include <assert.h> +#include <string.h> + +#include "salsa20.h" + +#include "macros.h" + +#ifdef WORDS_BIGENDIAN +#define LE_SWAP32(v) \ + ((ROTL32(8, v) & 0x00FF00FFUL) | \ + (ROTL32(24, v) & 0xFF00FF00UL)) +#else +#define LE_SWAP32(v) (v) +#endif + +#define QROUND(x0, x1, x2, x3) do { \ + x1 ^= ROTL32(7, x0 + x3); \ + x2 ^= ROTL32(9, x1 + x0); \ + x3 ^= ROTL32(13, x2 + x1); \ + x0 ^= ROTL32(18, x3 + x2); \ + } while(0) + +void +_salsa20_core(uint32_t *dst, const uint32_t *src, unsigned rounds) +{ + uint32_t x[_SALSA20_INPUT_LENGTH]; + unsigned i; + + assert ( (rounds & 1) == 0); + + memcpy (x, src, sizeof(x)); + for (i = 0; i < rounds;i += 2) + { + QROUND(x[0], x[4], x[8], x[12]); + QROUND(x[5], x[9], x[13], x[1]); + QROUND(x[10], x[14], x[2], x[6]); + QROUND(x[15], x[3], x[7], x[11]); + + QROUND(x[0], x[1], x[2], x[3]); + QROUND(x[5], x[6], x[7], x[4]); + QROUND(x[10], x[11], x[8], x[9]); + QROUND(x[15], x[12], x[13], x[14]); + } + + for (i = 0; i < _SALSA20_INPUT_LENGTH; i++) + { + uint32_t t = x[i] + src[i]; + dst[i] = LE_SWAP32 (t); + } +} diff --git a/salsa20-crypt.c b/salsa20-crypt.c index eae3cea..b061b4b 100644 --- a/salsa20-crypt.c +++ b/salsa20-crypt.c @@ -40,21 +40,6 @@ #include "macros.h" #include "memxor.h"
-#ifdef WORDS_BIGENDIAN -#define LE_SWAP32(v) \ - ((ROTL32(8, v) & 0x00FF00FFUL) | \ - (ROTL32(24, v) & 0xFF00FF00UL)) -#else -#define LE_SWAP32(v) (v) -#endif - -#define QROUND(x0, x1, x2, x3) do { \ - x1 ^= ROTL32(7, x0 + x3); \ - x2 ^= ROTL32(9, x1 + x0); \ - x3 ^= ROTL32(13, x2 + x1); \ - x0 ^= ROTL32(18, x3 + x2); \ - } while(0) - void salsa20_crypt(struct salsa20_ctx *ctx, unsigned length, @@ -67,26 +52,8 @@ salsa20_crypt(struct salsa20_ctx *ctx, for (;;) { uint32_t x[_SALSA20_INPUT_LENGTH]; - int i; - memcpy (x, ctx->input, sizeof(x)); - for (i = 0;i < 10;i ++) - { - QROUND(x[0], x[4], x[8], x[12]); - QROUND(x[5], x[9], x[13], x[1]); - QROUND(x[10], x[14], x[2], x[6]); - QROUND(x[15], x[3], x[7], x[11]);
- QROUND(x[0], x[1], x[2], x[3]); - QROUND(x[5], x[6], x[7], x[4]); - QROUND(x[10], x[11], x[8], x[9]); - QROUND(x[15], x[12], x[13], x[14]); - } - - for (i = 0;i < _SALSA20_INPUT_LENGTH;++i) - { - uint32_t t = x[i] + ctx->input[i]; - x[i] = LE_SWAP32 (t); - } + _salsa20_core (x, ctx->input, 20);
ctx->input[9] += (++ctx->input[8] == 0);
diff --git a/salsa20.h b/salsa20.h index 7d47f52..d95d002 100644 --- a/salsa20.h +++ b/salsa20.h @@ -37,6 +37,7 @@ extern "C" { #define salsa20_set_key nettle_salsa20_set_key #define salsa20_set_iv nettle_salsa20_set_iv #define salsa20_crypt nettle_salsa20_crypt +#define _salsa20_core _nettle_salsa20_core
/* Minimum and maximum keysizes, and a reasonable default. In * octets.*/ @@ -75,6 +76,9 @@ salsa20_crypt(struct salsa20_ctx *ctx, unsigned length, uint8_t *dst, const uint8_t *src);
+void +_salsa20_core(uint32_t *dst, const uint32_t *src, unsigned rounds); + #ifdef __cplusplus } #endif