Re: additional API for SHAKE streaming read

24 Mar 2024

      Hello Niels,
Niels Möller nisse@lysator.liu.se writes:
...
Daiki Ueno ueno@gnu.org writes:
...
...

One could perhaps use index == 0 instead of index == block_size for
the case that there is no buffered data. But the current convention
does make your "if (length <= left)" nice and simple.

I agree that the current convention is a bit awkward, so in the attached
patch I changed to use index == 0 as the indicator where buffering is
needed.  That actually makes the code simpler as we can defer buffering
until when the data is read.  One drawback though is that it causes
additional memcpy in a corner case where the _shake_output is used to
retrieve data smaller than the block size.
I wonder if that will still be simpler if one also moves the
sha3_permute calls?
I have merged your previous version to a branch
add-sha3_256_shake_output, and ci looks green. So perhaps best to merge
that to master, and iterate from there?
Sorry for the delay, and thank you for merging it to master.  I've come
up with the attached 3 patches on top of it, which basically do:
- Apply my changes in the previous post to count index from zero, not
  the end of the block
- Rename sha3_256_shake_output to sha3_shake256_output and add
  sha3_shake256_init/update as well, as you suggested in the previous
  conversation.  That would help us implement SHAKE128 without exposing
  SHA3-128 digest functions and I find it easier to read when used in
  the ML-KEM implementation.
- Generalize _shake_output function independent of the underlying SHA-3
  algorithm.
...
...
...

It looks a bit backwards to me that each iteration *first* copies data
to the digest, and *then* calls sha3_permute. In case no more data is
to be output, that sha3_permute call is wasted. It would be more
natural to me to not call sha3_permute until we know the output is
needed. But to fix that and still keep things nice for the first
output block, I think one would need to reorganize _nettle_sha3_pad to
not imply a call to sha3_permute (via sha3_absorb). So that's better
done in a separate change.

Right, I can do that after the current patch is settled.
I've done a bit of hacking locally. What I did was to take out the
xoring parts of sha3_absorb into it's own function sha3_xor_block, and
let sha3_pad_shake use that, without any call to sha3_permute. And then
call sha3_permute as output is needed.
Cool!
...
...
...

I'm still tempted to use ctx->index = ~index rather than ctx->index =
index | INDEX_HIGH_BIT. But maybe that would just be too obfuscated.

I'm actually not sure how this works.  For example, if unsigned int is
32-bit and index is 3, wouldn't ~index turn to 0xfffffffc, while index |
INDEX_HIGH_BIT is 0x80000003?
It would be a different representation, with the very minor advantage
that the INDEX_HIGH_BIT value isn't needed (in source code, or handled
at runtime). Like
index = ctx->index;
if (index < sizeof(ctx->block)) 
    { ... first call to shake_output, pad and initialize...  }
  else
    index = ~index;
assert (index <= sizeof(ctx->block));
... output processing ...
ctx->index = ~index;
I see.  I rewote it using this pattern.
...
...
...
In next step, to also support shake128, we should generalize your code
using an internal function _sha3_shake_output taking block and block
size as arguments.
Yes.
I've tried that in my local hack, I think it's rather straight-forward.
(I might be able to post corresponding patch later). What's unclear is
how much to share between _shake and shake_output. One could define
_shake as _shake_output + _init. The drawback I see is that (i) we would
allow _shake_output followed by _shake, which isn't proper api usage,
and (ii) _shake needs a lot less logic since it should always start by
padding, and it doesn't need to buffer any data, so it seems a bit wrong
to have it call shake_output that does this unneeded extra work.
In the attached patch, _shake is now re-implemented using _shake_output
+ _init, and also marked as deprecated in favor of the new incremental
interface.
Regards,
-- 
Daiki Ueno

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: additional API for SHAKE streaming read