I also have a few comments on the code.
Dmitry Eremin-Solenikov dbaryshkov@gmail.com writes:
+void +cfb_encrypt(const void *ctx, nettle_cipher_func *f,
size_t block_size, uint8_t *iv,
size_t length, uint8_t *dst,
const uint8_t *src)
+{
- if (src != dst)
- {
uint8_t *p;
for (p = iv; length >= block_size; p = dst, dst += block_size, src += block_size, length -= block_size)
- {
f(ctx, block_size, dst, p);
memxor(dst, src, block_size);
- }
if (p != iv)
- memcpy(iv, p, block_size);
if (length)
- {
TMP_DECL(buffer, uint8_t, NETTLE_MAX_CIPHER_BLOCK_SIZE);
TMP_ALLOC(buffer, block_size);
f(ctx, block_size, buffer, iv);
memxor3(dst, buffer, src, length);
memcpy(iv, dst, length);
- }
- }
- else
- {
TMP_DECL(buffer, uint8_t, NETTLE_MAX_CIPHER_BLOCK_SIZE);
TMP_ALLOC(buffer, block_size);
uint8_t *p;
for (p = iv; length >= block_size; p = dst, dst += block_size, length -= block_size)
- {
f(ctx, block_size, buffer, p);
memxor(dst, buffer, block_size);
- }
if (p != iv)
- memcpy(iv, p, block_size);
if (length)
- {
f(ctx, block_size, buffer, iv);
memxor(dst, buffer, length);
memcpy(iv, dst, length);
- }
- }
+}
It may be nice to reduce the code duplication for the two cases; only the block loop really needs to be different. I think it's fine to move allocation of buffer up front even if it's unnecessary in the case src != dst and there's no left-over; it's only an alloca so it's pretty check. Also unconditionally using memxor3 rather than memxor for the left-over is a small overhead. Do you think those operations are performance critical?
Also for cfb_decrypt it would be nice to take common parts out of the src != dst condition, but not quite obvious how to do it.
+/* Don't allocate any more space than this on the stack */ +#define CFB_BUFFER_LIMIT 512
+void +cfb_decrypt(const void *ctx, nettle_cipher_func *f,
size_t block_size, uint8_t *iv,
size_t length, uint8_t *dst,
const uint8_t *src)
+{
- if (src != dst)
- {
size_t left = length % block_size;
Avoiding % would be nice, division by non-constants is expensive (might apply to the old cbc code too). But perhaps tricky, since for the final part we need to call f with a length which is rounded down to an integral number of blocks.
while (length > 0)
- {
uint32_t part = length > buffer_size ? buffer_size : length;
Should be size_t, not uint32_t.
+void +cfb8_encrypt(const void *ctx, nettle_cipher_func *f,
size_t block_size, uint8_t *iv,
size_t length, uint8_t *dst,
const uint8_t *src)
+{
As I wrote in my other mail, I don't think we need a separate function for this. My understanding is that the main use case of CFB8 is when we get one octet at a time to encrypt and transmit, i.e., tis function called with length == 1. If that's right, there's little use to optimize it's performance for larger length.
Regards, /Niels