Nicolas Mora nicolas@babelouest.org writes:
I still have one uresolved comment about byte swapping but the rest are resolved.
Thanks. I'll do this round of comments on email, since it might be of interest to other contributors.
* About the byteswapping comment, the code
// A = MSB(64, B) ^ t where t = (n*j)+i A64 = READ_UINT64(B.b); A64 ^= (n*j)+(i+1); WRITE_UINT64(A.b, A64);
could be replaced by something like
#if WORDS_BIGENDIAN A.u64 = B.u64 ^ (n*j)+(i+1); #elif HAVE_BUILTIN_BSWAP64 A.u64 = B.u64 ^ __builtin_bswap64((n*j)+(i+1)); #else ... READ_UINT64 / WRITE_UINT64 or some other workaround ... #endif
Preferably encapsulated into a single macro, so it doesn't have to be duplicated in both the wrap and the unwrap function. There's another example of using __builtin_bswap64 in ctr.c.
* Intialization: If you don't intend to use the initial values, omit initialization in declarations like
union nettle_block16 I = {0}, B = {0}; union nettle_block8 A = {0};
That helps tools like valgrind detect accidental use of uninitialized data. (And then I'm not even sure exactly how initializers are interpreted for a union type).
* Some or all memcpys in the main loop can be replaced by uint64_t operations, e.g.,
I.u64 = A.u64;
instead of
memcpy(I.b, A.b, 8);
(memcpy is needed when either left or right hand side is an unaligned byte buffer). If it turns out that you never use .b on some variable, you can drop the use of the union type for that variable and use uint64_t directly.
Therefore I removed 'uint8_t R[64]' to use TMP_GMP_DECL(R, uint8_t); instead.
Unfortunately, that doesn't work: This code should go into libnettle (not libhogweed), and then it can't depend on GMP. You could do plain malloc + free, but according to the README file, Nettle doesn't do memory allocation, so that's not ideal.
I think it should be doable to reuse the output buffer as temporary storage (R = ciphertext for wrap, R = cleartext for unwrap). In-place operation (ciphertext == cleartext) should be supported (but no partial overlap), so it's important to test that case.
Using the output area directly has the drawback that it isn't aligned, so you'll need to keep some memcpys in the main loop. One could consider using an aligned pointer into output buffer and separate handling of first and/or last block, but if that's a lot of extra complexty, I wouldn't do it unless either (i) it gives a significant performance improvement, or (ii) it turns out to actually be reasonably nice and clean.
* And one more nit: Indentation. It's fine to use TAB characters, but they must be assumed to be traditional TAB to 8 positions: changing the appearance of TAB to anything else in one's editor is wrong, because it makes the code look weird for everyone else (e.g., in gitlab's ui). And the visual appearance should follow GNU standards, braces on their own lines, indent steps of two spaces, which means usually SPC characters, with TAB only for large indentation.
Regards, /Niels