Hello Niels,
Niels Möller nisse@lysator.liu.se writes:
Daiki Ueno ueno@gnu.org writes:
- One could perhaps use index == 0 instead of index == block_size for the case that there is no buffered data. But the current convention does make your "if (length <= left)" nice and simple.
I agree that the current convention is a bit awkward, so in the attached patch I changed to use index == 0 as the indicator where buffering is needed. That actually makes the code simpler as we can defer buffering until when the data is read. One drawback though is that it causes additional memcpy in a corner case where the _shake_output is used to retrieve data smaller than the block size.
I wonder if that will still be simpler if one also moves the sha3_permute calls?
I have merged your previous version to a branch add-sha3_256_shake_output, and ci looks green. So perhaps best to merge that to master, and iterate from there?
Sorry for the delay, and thank you for merging it to master. I've come up with the attached 3 patches on top of it, which basically do:
- Apply my changes in the previous post to count index from zero, not the end of the block
- Rename sha3_256_shake_output to sha3_shake256_output and add sha3_shake256_init/update as well, as you suggested in the previous conversation. That would help us implement SHAKE128 without exposing SHA3-128 digest functions and I find it easier to read when used in the ML-KEM implementation.
- Generalize _shake_output function independent of the underlying SHA-3 algorithm.
- It looks a bit backwards to me that each iteration *first* copies data to the digest, and *then* calls sha3_permute. In case no more data is to be output, that sha3_permute call is wasted. It would be more natural to me to not call sha3_permute until we know the output is needed. But to fix that and still keep things nice for the first output block, I think one would need to reorganize _nettle_sha3_pad to not imply a call to sha3_permute (via sha3_absorb). So that's better done in a separate change.
Right, I can do that after the current patch is settled.
I've done a bit of hacking locally. What I did was to take out the xoring parts of sha3_absorb into it's own function sha3_xor_block, and let sha3_pad_shake use that, without any call to sha3_permute. And then call sha3_permute as output is needed.
Cool!
- I'm still tempted to use ctx->index = ~index rather than ctx->index = index | INDEX_HIGH_BIT. But maybe that would just be too obfuscated.
I'm actually not sure how this works. For example, if unsigned int is 32-bit and index is 3, wouldn't ~index turn to 0xfffffffc, while index | INDEX_HIGH_BIT is 0x80000003?
It would be a different representation, with the very minor advantage that the INDEX_HIGH_BIT value isn't needed (in source code, or handled at runtime). Like
index = ctx->index;
if (index < sizeof(ctx->block)) { ... first call to shake_output, pad and initialize... } else index = ~index;
assert (index <= sizeof(ctx->block));
... output processing ...
ctx->index = ~index;
I see. I rewote it using this pattern.
In next step, to also support shake128, we should generalize your code using an internal function _sha3_shake_output taking block and block size as arguments.
Yes.
I've tried that in my local hack, I think it's rather straight-forward. (I might be able to post corresponding patch later). What's unclear is how much to share between _shake and shake_output. One could define _shake as _shake_output + _init. The drawback I see is that (i) we would allow _shake_output followed by _shake, which isn't proper api usage, and (ii) _shake needs a lot less logic since it should always start by padding, and it doesn't need to buffer any data, so it seems a bit wrong to have it call shake_output that does this unneeded extra work.
In the attached patch, _shake is now re-implemented using _shake_output + _init, and also marked as deprecated in favor of the new incremental interface.
Regards,