Simon Josefsson simon@josefsson.org writes:
Hi! Please find attached a port of DJB's public domain code for Salsa20 to nettle. The patch is not meant to be applied as-is but to start a discussion.
I'm not familiar with this cipher... Could you explain briefly why one may want to use it?
While writing this I noticed a sub-optimal design of Nettle, in that it overloads the meaning of struct nettle_cipher->block_size as a flag of whether the cipher is a stream cipher or not. Before arcfour was the only stream cipher in nettle, and using block_size=0 to signal that it is a stream cipher worked fine. However, Salsa20 has a block size of 8 bytes.
If Salsa20 doesn't fit in the nettle_cipher abstraction, then I think we should start by implementing Salsa20 *without* trying to get it to fit there. To me, a "stream cipher" with a block size and iv seems a bit like a contradiction.
In general, I don't think it makes much sense to include algorithms which behave very differently under the "nettle_cipher" umbrella, since the point of that abstraction is to collect algorithms which are easily exchangable. Code accepting a nettle_cipher are not expected to check flags or behave fundamentally different for different algorithms.
Including the arcfour stream cipher there may well be a mistake. One annoying problem with the current design is that for block ciphers, the ctx argument of the encrypt and decrypt functions naturally is const, but due arcfour being fittted in the same nettle_cipher abstraction, the function typedef nettle_crypt_func can't use a const ctx.
Anyway, I'd prefer to treat redesign in that area as an independent problem from adding support for Salsa20.
Then there's also the tentative (nettle-internal.h) nettle_aead abstraction. Salsa20 could perhaps fit there, if we allow algorithms with no authentication (NULL digest function pointer).
What do you think? I'd be happy to propose a patch to implement 2) and then a follow-on patch to add support for Salsa20.
I'd prefer the opposite order...
Some quick comments on the code and algorithm below. This is the first I hear about it, so I may well be misunderstanding how it is supposed to work.
--- /dev/null +++ b/salsa20.c +#define ROTL32(x,n) ((((x))<<(n)) | (((x))>>(32-(n))))
There are several different variants of that macro. It would be nice with a unified one in macros.h.
+#define SWAP32(v) \
- ((ROTL32(v, 8) & 0x00FF00FFUL) | \
- (ROTL32(v, 24) & 0xFF00FF00UL))
+#ifdef WORDS_BIGENDIAN +#define U32TO32_LITTLE(v) SWAP32(v) +#else +#define U32TO32_LITTLE(v) (v) +#endif
+#define U8TO32_LITTLE(p) U32TO32_LITTLE(((uint32_t*)(p))[0]) +#define U32TO8_LITTLE(p, v) (((uint32_t*)(p))[0] = U32TO32_LITTLE(v))
That's a clever byte swapping trick (at least if a true rot instruction is available). In Nettle conversion between bytes and integers are usually done with READ_UINT32, LE_READ_UINT32 and friends. It's usually not very performance critical, and it deals naturally with any alignment. I suspect U8TO32_LITTLE above breaks if the input is unaligned but the architecture doesn't allow unaligned word reads.
+/* +salsa20-ref.c version 20051118 +D. J. Bernstein +Public domain. +*/
Shouldn't that be moved up closer to the (C) header?
+#define ROTATE(v,c) (ROTL32(v,c)) +#define XOR(v,w) ((v) ^ (w)) +#define PLUS(v,w) ((v) + (w)) +#define PLUSONE(v) (PLUS((v),1))
I'm not sure these macros improve readability...
+static void salsa20_wordtobyte(uint8_t output[64],const uint32_t input[16])
Despite the name, this is the main encryption function. Or rather, a (hopefully) one-way function? "input" is really the context struct, which contains some mix of the key and the iv?
- x[ 4] = XOR(x[ 4],ROTATE(PLUS(x[ 0],x[12]), 7));
And what's wrong with writing that as
x[ 4] ^= ROTATE(x[ 0] + x[12], 7)
+void +salsa20_set_key(struct salsa20_ctx *ctx,
unsigned length, const uint8_t *key)
+{
Ok, this initializes the input arrray at indices 0-5, and 10-15.
+void +salsa20_set_iv(struct salsa20_ctx *ctx, unsigned length, const uint8_t *iv) +{
- assert (length == SALSA20_IV_SIZE);
And this initializes indices 6-9. I'd remove the length argument, if only one value is allowed anyway.
+void +salsa20_crypt(struct salsa20_ctx *ctx,
unsigned length,
uint8_t *c,
const uint8_t *m)
+{
- uint8_t output[64];
- unsigned i;
- if (!length) return;
- for (;;) {
- salsa20_wordtobyte(output,ctx->input);
That's applying a oneway function to the input,
- ctx->input[8] = PLUSONE(ctx->input[8]);
- if (!ctx->input[8]) {
ctx->input[9] = PLUSONE(ctx->input[9]);
/* stopping at 2^70 length per nonce is user's responsibility */
- }
followed by an increment of the iv, where the second half appearantly is used as a counter.
- if (length <= 64) {
for (i = 0;i < length;++i) c[i] = m[i] ^ output[i];
return;
- }
- for (i = 0;i < 64;++i) c[i] = m[i] ^ output[i];
Followed by xoring the generated key stream to get the ciphertext.
Do I understand it correctly that there's nothing really non-linear anywhere in this cipher? Only a mix of + and ^ which both are linear, but over different rings. So the security aginst a known plaintext attack would rest on the carry propagation, together with the large state space of 512 bits (448 bits of expanded key material, and 64 bits counter).
Regards, /Niels