additional API for SHAKE streaming read

List overview All Threads
Download

newer

older

ppc64 micro optimization

Issue tracker

Daiki Ueno

25 Feb 2024 25 Feb '24

8:37 a.m.

Hello,

When I'm trying to implement ML-KEM (Kyber), I realized that the current API for SHAKE (sha3_256_shake) is a bit too limited: while ML-KEM uses SHAKE128 as a source of pseudorandom samples[1], the the current API requires the total number of bytes are determined prior to the call, and after the call the hash context is reset.

Here I propose adding a couple of helper functions to support such streaming use-case: sha3_256_shake_pad to process the final block, and sha3_256_shake_read to read the digest from the current state without resetting the context. With those functions, one could write a streaming read interface as attached.

What do you think? The actual code changes can be found at: https://git.lysator.liu.se/nettle/nettle/-/merge_requests/61

I also have a SHAKE128 implementation with analogous API, which I will post later.

Regards,

Footnotes: [1] https://bwesterb.github.io/draft-schwabe-cfrg-kyber/draft-cfrg-schwabe-kyber...

-- Daiki Ueno #include "sha3.h" #include <stdio.h> struct shake_reader { struct sha3_256_ctx ctx; uint8_t buf[SHA3_256_BLOCK_SIZE]; size_t offset; }; static void shake_reader_init (struct shake_reader *reader) { sha3_256_init (&reader->ctx); reader->offset = sizeof(reader->buf); } static void shake_reader_read (struct shake_reader *reader, size_t length, uint8_t *digest) { while (length > 0) { while (reader->offset < sizeof(reader->buf)) { *digest++ = reader->buf[reader->offset++]; length--; if (!length) return; } if (reader->offset == sizeof(reader->buf)) { sha3_256_shake_read (&reader->ctx, sizeof(reader->buf), reader->buf); sha3_permute (&reader->ctx.state); reader->offset = 0; } } } int main (void) { struct shake_reader reader; uint8_t buf[3]; size_t n = 0; shake_reader_init (&reader); sha3_256_update (&reader.ctx, 0, NULL); sha3_256_shake_pad (&reader.ctx); while (n < 256) { shake_reader_read (&reader, sizeof(buf), buf); int d1 = buf[0] + 256 * (buf[1] % 16); int d2 = (buf[1] >> 4) + 16 * buf[2]; if (d1 < 3329) { printf ("%d\n", d1); n++; } if (n == 256) break; if (d2 < 3329) { printf ("%d\n", d2); n++; } } return 0; }

Show replies by date

Niels Möller

4 Mar 4 Mar

6:57 p.m.

Daiki Ueno ueno@gnu.org writes:

...

When I'm trying to implement ML-KEM (Kyber), I realized that the current API for SHAKE (sha3_256_shake) is a bit too limited: while ML-KEM uses SHAKE128 as a source of pseudorandom samples[1], the the current API requires the total number of bytes are determined prior to the call, and after the call the hash context is reset.

I vaguely recall discussing that when shake256 was added, and we concluded it was good enough as a start, and could be extended later.

I think it would be nice if one could support the streaming case with the existing struct sha3_256_ctx, and little extra wrapping. Question is what the interface should be. I see a few variants:

1. void /* Essentially the same as _sha3_pad_shake */ sha3_256_shake_start (struct sha3_256_ctx *ctx);

void /* Unbuffered, length must be a multiple of SHA3_256_BLOCK_SIZE */ sha3_256_shake_output (struct sha3_256_ctx *ctx size_t length, uint8_t *dst);

void /* Last call, length can be arbitrary, context reinitialized */ sha3_256_shake_end (struct sha3_256_ctx *ctx size_t length, uint8_t *dst);

Requiring all calls but the last to be full blocks is consistent with nettle's funtions for block ciphers. But since we anyway have a buffer available (to support arbitrary sizes for streaming the input), we could perhaps just as well reuse that buffer.

2. void /* Essentially the same as _sha3_pad_shake */ sha3_256_shake_start (struct sha3_256_ctx *ctx);

void /* Arbitrary length, no need to signal end of data */ sha3_256_shake_output (struct sha3_256_ctx *ctx size_t length, uint8_t *dst);

void /* Explicit init call needed to start a new input message */ sha3_256_init (struct sha3_256_ctx *ctx);

In this case, sha3_256_shake_output would use ctx->index and ctx->buffer for partial blocks.

With some hacking (say, using the unused high bit of ctx->index to signal that shake is in output mode), then we could have just

3. void /* Arbitrary length, no need to signal start or end of output */ sha3_256_shake_output (struct sha3_256_ctx *ctx size_t length, uint8_t *dst);

void /* Explicit init call needed to start a new input message */ sha3_256_init (struct sha3_256_ctx *ctx);

As always, naming is also a crucial question. Is _shake_output a good name? Or _shake_read, or _shake_generate? From the terminology in the spec (https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf), I think "_shake_output" is reasonable.

When deciding on naming and conventions, we should strive to define somthing that can be reused for later hash functions with variable output size (called extendable-output functions, "XOF", in the spec).

So what do you think makes most sense?

To be clear, the hack I'm referring to for option (3) would be something like

void /* Arbitrary length, no need to signal start or end of output */ sha3_256_shake_output (struct sha3_256_ctx *ctx size_t length, uint8_t *dst) { if (!(ctx->index >> 31)) /* 32-bit unsigned int, for simplicity of example */ { _sha3_pad_shake (&ctx->state, SHA3_256_BLOCK_SIZE, ctx->block, ctx->index); /* Not sure what representation is most suitable for index, but high bit must be set. */ ctx->index = ~0; } /* If leftovers in buffer (determined from index), copy to output */ /* While we still need more blocks, permute and copy one block to output */ /* If we need a partial block at the end, generate one into buffer, copy prefix of it to the output, and set index accordingly */ }

Regards, /Niels

-- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance.

Daiki Ueno

10 Mar 10 Mar

12:52 a.m.

Hello Niels,

Niels Möller nisse@lysator.liu.se writes:

...

Daiki Ueno ueno@gnu.org writes:

...
When I'm trying to implement ML-KEM (Kyber), I realized that the current API for SHAKE (sha3_256_shake) is a bit too limited: while ML-KEM uses SHAKE128 as a source of pseudorandom samples[1], the the current API requires the total number of bytes are determined prior to the call, and after the call the hash context is reset.

I vaguely recall discussing that when shake256 was added, and we concluded it was good enough as a start, and could be extended later.

I think it would be nice if one could support the streaming case with the existing struct sha3_256_ctx, and little extra wrapping. Question is what the interface should be. I see a few variants:

void /* Essentially the same as _sha3_pad_shake */ sha3_256_shake_start (struct sha3_256_ctx *ctx);

void /* Unbuffered, length must be a multiple of SHA3_256_BLOCK_SIZE */ sha3_256_shake_output (struct sha3_256_ctx *ctx size_t length, uint8_t *dst);

void /* Last call, length can be arbitrary, context reinitialized */ sha3_256_shake_end (struct sha3_256_ctx *ctx size_t length, uint8_t *dst);

Requiring all calls but the last to be full blocks is consistent with nettle's funtions for block ciphers. But since we anyway have a buffer available (to support arbitrary sizes for streaming the input), we could perhaps just as well reuse that buffer.

void /* Essentially the same as _sha3_pad_shake */ sha3_256_shake_start (struct sha3_256_ctx *ctx);

void /* Arbitrary length, no need to signal end of data */ sha3_256_shake_output (struct sha3_256_ctx *ctx size_t length, uint8_t *dst);

void /* Explicit init call needed to start a new input message */ sha3_256_init (struct sha3_256_ctx *ctx);

In this case, sha3_256_shake_output would use ctx->index and ctx->buffer for partial blocks.

With some hacking (say, using the unused high bit of ctx->index to signal that shake is in output mode), then we could have just

void /* Arbitrary length, no need to signal start or end of output */ sha3_256_shake_output (struct sha3_256_ctx *ctx size_t length, uint8_t *dst);

void /* Explicit init call needed to start a new input message */ sha3_256_init (struct sha3_256_ctx *ctx);

As always, naming is also a crucial question. Is _shake_output a good name? Or _shake_read, or _shake_generate? From the terminology in the spec (https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf), I think "_shake_output" is reasonable.

When deciding on naming and conventions, we should strive to define somthing that can be reused for later hash functions with variable output size (called extendable-output functions, "XOF", in the spec).

So what do you think makes most sense?

Thank you. The option (3) sounds like a great idea as it only need one more function to be added for streaming. I tried to implement it as the attached patch.

Regards,

-- Daiki Ueno

Niels Möller

12:27 p.m.

Daiki Ueno ueno@gnu.org writes:

...

Thank you. The option (3) sounds like a great idea as it only need one more function to be added for streaming. I tried to implement it as the attached patch.

Thanks. Interface and tests looks very reasonable to me. Comments on the implementatino below.

Regards, /Niels

...

+void +sha3_256_shake_output(struct sha3_256_ctx *ctx,
      size_t length,
      uint8_t *digest)
+{

unsigned offset;

unsigned mask = UINT_MAX >> 1;

I think I'd name the local variable "index" rather than "offset", to match the state variable. And I think it would make sense with a define for the high bit, something like

#define INDEX_HIGH_BIT (~((UINT_MAX) >> 1))

(one could also use something like ~0U instead of UINT_MAX, but UINT_MAX may be more readable).

...

/* We use the leftmost bit as a flag to indicate SHAKE is initialized. */

if (ctx->index & ~mask)

offset = ctx->index & mask;

The value of offset here is in the range 0 < offset <= SHA3_256_BLOCK_SIZE, right? One could use a representation where

offset = ~ctx->index;

instead of bitwise operations. One would still need the condition if (ctx->index & INDEX_HIGH_BIT), but that would typically be compiled to the same as if ((signed int) ctx->index < 0).

I think it would also make sense with an

assert (ctx->index < SHA3_256_BLOCK_SIZE);

in the start of sha3_256_update, which will trigger if the update function is called after the output function, with no init in between.

...

else
{

 _sha3_pad_shake (&ctx->state, SHA3_256_BLOCK_SIZE, ctx->block, ctx->index);

 /* Point at the end of block to trigger fill in of the buffer.  */

```
 offset = sizeof (ctx->block);
```

I think this block deserves a comment that this is the first call to sha3_256_shake_output. For the block size, I think it would be nice to consitently use one of SHA3_256_BLOCK_SIZE and sizeof (ctx->block).

...

}
for (;;)
{

 /* Write remaining data from the buffer.  */

```
 if (offset < sizeof (ctx->block))
```
{
```
 unsigned remaining;
```

 remaining = MIN(length, sizeof (ctx->block) - offset);

 memcpy (digest, &ctx->block[offset], remaining);

```
 digest += remaining;
```
```
 offset += remaining;
```

I think handling of the leftover can be moved before the loop, and simplified as

unsigned left = sizeof(ctx->block) - offset; if (length <= left) { memcpy (digest, ctx->block + offset, length); ctx->index = (offset + length) | INDEX_HIGH_BIT; return; } memcpy (digest, ctx->block + offset, left); digest += left; length -= left;

followed by a loop

for (; length >= SHA3_256_BLOCK_SIZE; length -= SHA3_256_BLOCK_SIZE, digest += SHA3_256_BLOCK_SIZE) { ... output a full block ... }

if (length > 0) { ... do final partial block ... ctx->index = length | INDEX_HIGH_BIT; } else ctx->index = SHA3_256_BLOCK_SIZE | INDEX_HIGH_BIT;

-- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance.

Daiki Ueno

11 Mar 11 Mar

12:22 a.m.

Hello Niels,

Thank you for the suggestions, all makes sense to me.

Niels Möller nisse@lysator.liu.se writes:

...

...
+void +sha3_256_shake_output(struct sha3_256_ctx *ctx,
      size_t length,
      uint8_t *digest)
+{

unsigned offset;

unsigned mask = UINT_MAX >> 1;
I think I'd name the local variable "index" rather than "offset", to match the state variable. And I think it would make sense with a define for the high bit, something like

#define INDEX_HIGH_BIT (~((UINT_MAX) >> 1))

(one could also use something like ~0U instead of UINT_MAX, but UINT_MAX may be more readable).

Renamed and introduced the macro.

...

...

/* We use the leftmost bit as a flag to indicate SHAKE is initialized. */

if (ctx->index & ~mask)

offset = ctx->index & mask;

The value of offset here is in the range 0 < offset <= SHA3_256_BLOCK_SIZE, right? One could use a representation where

offset = ~ctx->index;

instead of bitwise operations. One would still need the condition if (ctx->index & INDEX_HIGH_BIT), but that would typically be compiled to the same as if ((signed int) ctx->index < 0).

I think it would also make sense with an

assert (ctx->index < SHA3_256_BLOCK_SIZE);

in the start of sha3_256_update, which will trigger if the update function is called after the output function, with no init in between.

...
else

{

_sha3_pad_shake (&ctx->state, SHA3_256_BLOCK_SIZE, ctx->block,

ctx->index);
 /* Point at the end of block to trigger fill in of the buffer.  */
 offset = sizeof (ctx->block);
I think this block deserves a comment that this is the first call to sha3_256_shake_output. For the block size, I think it would be nice to consitently use one of SHA3_256_BLOCK_SIZE and sizeof (ctx->block).

I chose the latter to not embed algorithm specific constant, as the same code also needs to support SHAKE128 (to be added).

...

...
}

for (;;)

{
 /* Write remaining data from the buffer.  */
 if (offset < sizeof (ctx->block))
{
 unsigned remaining;
 remaining = MIN(length, sizeof (ctx->block) - offset);
 memcpy (digest, &ctx->block[offset], remaining);
 digest += remaining;
 offset += remaining;
I think handling of the leftover can be moved before the loop, and simplified as

unsigned left = sizeof(ctx->block) - offset; if (length <= left) { memcpy (digest, ctx->block + offset, length); ctx->index = (offset + length) | INDEX_HIGH_BIT; return; } memcpy (digest, ctx->block + offset, left); digest += left; length -= left;

followed by a loop

for (; length >= SHA3_256_BLOCK_SIZE; length -= SHA3_256_BLOCK_SIZE, digest += SHA3_256_BLOCK_SIZE) { ... output a full block ... }

if (length > 0) { ... do final partial block ... ctx->index = length | INDEX_HIGH_BIT; } else ctx->index = SHA3_256_BLOCK_SIZE | INDEX_HIGH_BIT;

Yes, this makes the code a lot simpler. I'm attaching an updated patch.

Regards,

-- Daiki Ueno

Niels Möller

7:52 p.m.

Daiki Ueno ueno@gnu.org writes:

...

Yes, this makes the code a lot simpler. I'm attaching an updated patch.

Thanks, looks good to me. Some details I'm thinking about that might be improvements:

* One could perhaps use index == 0 instead of index == block_size for the case that there is no buffered data. But the current convention does make your "if (length <= left)" nice and simple.

* It looks a bit backwards to me that each iteration *first* copies data to the digest, and *then* calls sha3_permute. In case no more data is to be output, that sha3_permute call is wasted. It would be more natural to me to not call sha3_permute until we know the output is needed. But to fix that and still keep things nice for the first output block, I think one would need to reorganize _nettle_sha3_pad to not imply a call to sha3_permute (via sha3_absorb). So that's better done in a separate change.

* I'm still tempted to use ctx->index = ~index rather than ctx->index = index | INDEX_HIGH_BIT. But maybe that would just be too obfuscated.

Anything about that you agree with, or that you think should be done now?

In next step, to also support shake128, we should generalize your code using an internal function _sha3_shake_output taking block and block size as arguments.

I'm also not sure about proper naming for shake128. If I read the Instances table at https://en.wikipedia.org/wiki/SHA-3 right, there's no standard regular hash function corresponding to shake128. We could still name it sha3_128_shake, but that might be confusing (there's no corresponding sha3_128_digest, would there be any use for that?). The alternative could be to use names sha3_shakeN_init, sha3_shakeN_update, sha3_shakeN_digest, sha3_shakeN_output (with some of the shake256 functions, as well as the context struct, being aliases to corresponding sha3_256 names). But aliases also have a cost in potential confusion.

...

if (length > 0)
{

 /* Fill in the buffer for next call.  */

 _nettle_write_le64 (sizeof (ctx->block), ctx->block, ctx->state.a);

```
 sha3_permute (&ctx->state);
```
```
 memcpy (digest, ctx->block, length);
```
```
 ctx->index = length | INDEX_HIGH_BIT;
```
}
else
ctx->index = sizeof (ctx->block) | INDEX_HIGH_BIT;

If I read your code right, we actually always have length > 0 at this place. So either delete the if conditional, or change the condition of the loop above from (length > sizeof (ctx->block)) to (length >= sizeof (ctx->block)). The latter option would avoid a memcpy in the case that the requested digest ends with a full block.

Regards, /Niels

-- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance.

Daiki Ueno

14 Mar 14 Mar

10:30 a.m.

Niels Möller nisse@lysator.liu.se writes:

...

Daiki Ueno ueno@gnu.org writes:

...
Yes, this makes the code a lot simpler. I'm attaching an updated patch.

Thanks, looks good to me. Some details I'm thinking about that might be improvements:

One could perhaps use index == 0 instead of index == block_size for the case that there is no buffered data. But the current convention does make your "if (length <= left)" nice and simple.

I agree that the current convention is a bit awkward, so in the attached patch I changed to use index == 0 as the indicator where buffering is needed. That actually makes the code simpler as we can defer buffering until when the data is read. One drawback though is that it causes additional memcpy in a corner case where the _shake_output is used to retrieve data smaller than the block size.

...

It looks a bit backwards to me that each iteration *first* copies data to the digest, and *then* calls sha3_permute. In case no more data is to be output, that sha3_permute call is wasted. It would be more natural to me to not call sha3_permute until we know the output is needed. But to fix that and still keep things nice for the first output block, I think one would need to reorganize _nettle_sha3_pad to not imply a call to sha3_permute (via sha3_absorb). So that's better done in a separate change.

Right, I can do that after the current patch is settled.

...

I'm still tempted to use ctx->index = ~index rather than ctx->index = index | INDEX_HIGH_BIT. But maybe that would just be too obfuscated.

I'm actually not sure how this works. For example, if unsigned int is 32-bit and index is 3, wouldn't ~index turn to 0xfffffffc, while index | INDEX_HIGH_BIT is 0x80000003?

...

In next step, to also support shake128, we should generalize your code using an internal function _sha3_shake_output taking block and block size as arguments.

Yes.

...

I'm also not sure about proper naming for shake128. If I read the Instances table at https://en.wikipedia.org/wiki/SHA-3 right, there's no standard regular hash function corresponding to shake128. We could still name it sha3_128_shake, but that might be confusing (there's no corresponding sha3_128_digest, would there be any use for that?). The alternative could be to use names sha3_shakeN_init, sha3_shakeN_update, sha3_shakeN_digest, sha3_shakeN_output (with some of the shake256 functions, as well as the context struct, being aliases to corresponding sha3_256 names). But aliases also have a cost in potential confusion.

I agree; we probably shouldn't expose sha3_128_digest et. al., from the API.

...

...
if (length > 0)

{
 /* Fill in the buffer for next call.  */
 _nettle_write_le64 (sizeof (ctx->block), ctx->block, ctx->state.a);
 sha3_permute (&ctx->state);
 memcpy (digest, ctx->block, length);
 ctx->index = length | INDEX_HIGH_BIT;
}

else

ctx->index = sizeof (ctx->block) | INDEX_HIGH_BIT;
+}
If I read your code right, we actually always have length > 0 at this place. So either delete the if conditional, or change the condition of the loop above from (length > sizeof (ctx->block)) to (length >= sizeof (ctx->block)). The latter option would avoid a memcpy in the case that the requested digest ends with a full block.

Indeed, fixed.

Regards,

-- Daiki Ueno

Niels Möller

11:01 a.m.

Daiki Ueno ueno@gnu.org writes:

...

...

One could perhaps use index == 0 instead of index == block_size for the case that there is no buffered data. But the current convention does make your "if (length <= left)" nice and simple.

I agree that the current convention is a bit awkward, so in the attached patch I changed to use index == 0 as the indicator where buffering is needed. That actually makes the code simpler as we can defer buffering until when the data is read. One drawback though is that it causes additional memcpy in a corner case where the _shake_output is used to retrieve data smaller than the block size.

I wonder if that will still be simpler if one also moves the sha3_permute calls?

I have merged your previous version to a branch add-sha3_256_shake_output, and ci looks green. So perhaps best to merge that to master, and iterate from there?

...

...

It looks a bit backwards to me that each iteration *first* copies data to the digest, and *then* calls sha3_permute. In case no more data is to be output, that sha3_permute call is wasted. It would be more natural to me to not call sha3_permute until we know the output is needed. But to fix that and still keep things nice for the first output block, I think one would need to reorganize _nettle_sha3_pad to not imply a call to sha3_permute (via sha3_absorb). So that's better done in a separate change.

Right, I can do that after the current patch is settled.

I've done a bit of hacking locally. What I did was to take out the xoring parts of sha3_absorb into it's own function sha3_xor_block, and let sha3_pad_shake use that, without any call to sha3_permute. And then call sha3_permute as output is needed.

...

...

I'm still tempted to use ctx->index = ~index rather than ctx->index = index | INDEX_HIGH_BIT. But maybe that would just be too obfuscated.

I'm actually not sure how this works. For example, if unsigned int is 32-bit and index is 3, wouldn't ~index turn to 0xfffffffc, while index | INDEX_HIGH_BIT is 0x80000003?

It would be a different representation, with the very minor advantage that the INDEX_HIGH_BIT value isn't needed (in source code, or handled at runtime). Like

index = ctx->index;

if (index < sizeof(ctx->block)) { ... first call to shake_output, pad and initialize... } else index = ~index;

assert (index <= sizeof(ctx->block));

... output processing ...

ctx->index = ~index;

...

...
In next step, to also support shake128, we should generalize your code using an internal function _sha3_shake_output taking block and block size as arguments.

Yes.

I've tried that in my local hack, I think it's rather straight-forward. (I might be able to post corresponding patch later). What's unclear is how much to share between _shake and shake_output. One could define _shake as _shake_output + _init. The drawback I see is that (i) we would allow _shake_output followed by _shake, which isn't proper api usage, and (ii) _shake needs a lot less logic since it should always start by padding, and it doesn't need to buffer any data, so it seems a bit wrong to have it call shake_output that does this unneeded extra work.

/Regards, Niels

-- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance.

Daiki Ueno

24 Mar 24 Mar

1:10 a.m.

Hello Niels,

Niels Möller nisse@lysator.liu.se writes:

...

Daiki Ueno ueno@gnu.org writes:

...
...

One could perhaps use index == 0 instead of index == block_size for the case that there is no buffered data. But the current convention does make your "if (length <= left)" nice and simple.

I agree that the current convention is a bit awkward, so in the attached patch I changed to use index == 0 as the indicator where buffering is needed. That actually makes the code simpler as we can defer buffering until when the data is read. One drawback though is that it causes additional memcpy in a corner case where the _shake_output is used to retrieve data smaller than the block size.

I wonder if that will still be simpler if one also moves the sha3_permute calls?

I have merged your previous version to a branch add-sha3_256_shake_output, and ci looks green. So perhaps best to merge that to master, and iterate from there?

Sorry for the delay, and thank you for merging it to master. I've come up with the attached 3 patches on top of it, which basically do:

- Apply my changes in the previous post to count index from zero, not the end of the block

- Rename sha3_256_shake_output to sha3_shake256_output and add sha3_shake256_init/update as well, as you suggested in the previous conversation. That would help us implement SHAKE128 without exposing SHA3-128 digest functions and I find it easier to read when used in the ML-KEM implementation.

- Generalize _shake_output function independent of the underlying SHA-3 algorithm.

...

...
...

It looks a bit backwards to me that each iteration *first* copies data to the digest, and *then* calls sha3_permute. In case no more data is to be output, that sha3_permute call is wasted. It would be more natural to me to not call sha3_permute until we know the output is needed. But to fix that and still keep things nice for the first output block, I think one would need to reorganize _nettle_sha3_pad to not imply a call to sha3_permute (via sha3_absorb). So that's better done in a separate change.

Right, I can do that after the current patch is settled.

I've done a bit of hacking locally. What I did was to take out the xoring parts of sha3_absorb into it's own function sha3_xor_block, and let sha3_pad_shake use that, without any call to sha3_permute. And then call sha3_permute as output is needed.

Cool!

...

...
...

I'm still tempted to use ctx->index = ~index rather than ctx->index = index | INDEX_HIGH_BIT. But maybe that would just be too obfuscated.

I'm actually not sure how this works. For example, if unsigned int is 32-bit and index is 3, wouldn't ~index turn to 0xfffffffc, while index | INDEX_HIGH_BIT is 0x80000003?

It would be a different representation, with the very minor advantage that the INDEX_HIGH_BIT value isn't needed (in source code, or handled at runtime). Like

index = ctx->index;

if (index < sizeof(ctx->block)) { ... first call to shake_output, pad and initialize... } else index = ~index;

assert (index <= sizeof(ctx->block));

... output processing ...

ctx->index = ~index;

I see. I rewote it using this pattern.

...

...
...
In next step, to also support shake128, we should generalize your code using an internal function _sha3_shake_output taking block and block size as arguments.

Yes.

I've tried that in my local hack, I think it's rather straight-forward. (I might be able to post corresponding patch later). What's unclear is how much to share between _shake and shake_output. One could define _shake as _shake_output + _init. The drawback I see is that (i) we would allow _shake_output followed by _shake, which isn't proper api usage, and (ii) _shake needs a lot less logic since it should always start by padding, and it doesn't need to buffer any data, so it seems a bit wrong to have it call shake_output that does this unneeded extra work.

In the attached patch, _shake is now re-implemented using _shake_output + _init, and also marked as deprecated in favor of the new incremental interface.

Regards,

-- Daiki Ueno

Niels Möller

10 a.m.

Daiki Ueno ueno@gnu.org writes:

...

Sorry for the delay, and thank you for merging it to master. I've come up with the attached 3 patches on top of it, which basically do:

Thanks for moving this forward. I haven't had time to share my recent patches either. I have a few concerns, though.

...

Apply my changes in the previous post to count index from zero, not the end of the block

I'm not yet convinced this is a net win, it looks like you need a "% sizeof (ctx->block)" to make that work, and I'd like to avoid divisions, in particular, since when general generalizing this to also support shake128, the divisor will no longer be constant.

...

Rename sha3_256_shake_output to sha3_shake256_output and add sha3_shake256_init/update as well, as you suggested in the previous conversation. That would help us implement SHAKE128 without exposing SHA3-128 digest functions and I find it easier to read when used in the ML-KEM implementation.

I'm fine with adding new sha3_shake256_* names, but I think we should keep old name (which you added for Nettle-3.6). And I think we can use the same context struct, possibly with convenience aliases like

#define sha3_shake256_ctx sha3_256_ctx #define sha3_shake256_init sha3_256_init

I agree we shouldn't define sha3_128_digest now (as far as I'm aware, there's no authoritative spec for that), but I think we should design the api so that it fits if added later.

I'm a bit confused by the choice of shake128 for ML-KEM, and I would expect that if there are applications where shake128 is a reasonable security tradeoff, then there likely are reasonable applications of sha3_128 too. I don't understand the fine details of sha3 security analysis, but I'd guess that for applications where second preimage (in contrast to arbitrary collisions) is the relevant attack, sha3_128 should be as secure as sha3_shake128 with a larger output size.

...

Generalize _shake_output function independent of the underlying SHA-3 algorithm.

Certainly needed.

I don't think the all-in-one shake function should be deprecated, it seems like a nice utility. What I'm not sur about about is if it should be implemented as _output + _init (very cheap implementation) or its own function (with less runtime overhead than _output).

I'll try to clean up and post or commit some of my changes, I'm sorry that will cause some conflicts.

Regards, /Niels

-- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance.

Niels Möller

1:24 p.m.

Niels Möller nisse@lysator.liu.se writes:

...

I'll try to clean up and post or commit some of my changes, I'm sorry that will cause some conflicts.

I've pushed my changes to a branch sha3-shake-updates, does that look reasonable to you? If so, I think the next steps are

1. Decide what should be renamed sha3_shake256_* 2. Implement shake128. 3. Update docs.

Regards, /Niels

-- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance.

Daiki Ueno

28 Mar 28 Mar

11:50 a.m.

Niels Möller nisse@lysator.liu.se writes:

...

Niels Möller nisse@lysator.liu.se writes:

...
I'll try to clean up and post or commit some of my changes, I'm sorry that will cause some conflicts.

I've pushed my changes to a branch sha3-shake-updates, does that look reasonable to you? If so, I think the next steps are

Yes, that looks good to me, except _nettle_sha3_shake has a copy-and-paste error where SHA3_256_BLOCK_SIZE is hard-coded.

...

Decide what should be renamed sha3_shake256_*

I guess we can live with the existing interface. For SHAKE128, we could only provide sha3_128_init, sha3_128_update, and sha3_128_shake{,_output}, without sha3_128_digest.

...

Implement shake128.

I've extracted it from the ML-KEM merge request and put it here: https://git.lysator.liu.se/nettle/nettle/-/merge_requests/63

Not sending via email as it includes a huge test vector.

...

Update docs.

I can do that once we settle the interface.

Regards,

-- Daiki Ueno

Niels Möller

7:25 p.m.

Daiki Ueno ueno@gnu.org writes:

...

Yes, that looks good to me, except _nettle_sha3_shake has a copy-and-paste error where SHA3_256_BLOCK_SIZE is hard-coded.

Thanks, good catch.

...

...

Decide what should be renamed sha3_shake256_*

I guess we can live with the existing interface. For SHAKE128, we could only provide sha3_128_init, sha3_128_update, and sha3_128_shake{,_output}, without sha3_128_digest.

Sounds good to me.

...

...

Implement shake128.

I've extracted it from the ML-KEM merge request and put it here: https://git.lysator.liu.se/nettle/nettle/-/merge_requests/63

Not sending via email as it includes a huge test vector.

Thanks, merged to the sha3-shake-updates branch. Sorry if you didn't intend me to do that right away (I noticed some minor problems after merge, which I've fixed). I'd like to merge to master after ci runs have completed.

...

...

Update docs.

I can do that once we settle the interface.

Excellent. To me, interface in sha3.h now looks good.

Regards, /Niels

-- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance.

Daiki Ueno

11 Apr 11 Apr

9:59 a.m.

Sorry for the late response.

Niels Möller nisse@lysator.liu.se writes:

...

...
...

Implement shake128.

I've extracted it from the ML-KEM merge request and put it here: https://git.lysator.liu.se/nettle/nettle/-/merge_requests/63

Not sending via email as it includes a huge test vector.

Thanks, merged to the sha3-shake-updates branch. Sorry if you didn't intend me to do that right away (I noticed some minor problems after merge, which I've fixed). I'd like to merge to master after ci runs have completed.

Thank you for merging it (with the fixes); I wasn't actually aware of the problems.

...

...
...

Update docs.

I can do that once we settle the interface.

Excellent. To me, interface in sha3.h now looks good.

I'm attaching a patch to update the documentation.

Regards,

-- Daiki Ueno

Niels Möller

13 Apr 13 Apr

7:52 p.m.

Daiki Ueno ueno@gnu.org writes:

...

I'm attaching a patch to update the documentation.

Thanks.

...

-@subsubsection @acronym{SHAKE-256} +@subsubsection @acronym{SHAKE-128} @cindex SHAKE

I think heading should be just "shake".

...

-In addition to those SHA-3 hash functions, Nettle also provides a SHA-3 -extendable-output function (XOF), SHAKE-256. Unlike SHA-3 hash functions, -SHAKE can produce an output digest of any desired length. +In addition to those SHA-3 hash functions, Nettle also provides a +SHA-3 extendable-output function (XOF) called SHAKE. Unlike hash +functions, SHAKE can produce an output digest of any desired +length. There are two variants, SHAKE-128 and SHAKE-256, with +different security strengths in terms of collision or preimage +resistance.

+SHAKE-128 internally uses a SHA-3 hash function with 128-bit security +strength against second preimage attacks. The hash function is not +usable alone with Nettle, only for the use with SHAKE-128.

I think it would be good to write in the intro that shake-256 corresponds to sha3-256, while shake-128 uses sha3 with parameters corresponding to 128-bit security, for which there's no corresponding plain hash function defined.

It might also make sense to explain the difference between _shake and _shake_output functions here, and make the description under each function a bit shorter.

Regards, /Niels

-- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance.

Daiki Ueno

14 Apr 14 Apr

2:36 a.m.

Hello Niels,

Thank you for the feedback.

Niels Möller nisse@lysator.liu.se writes:

...

...
-@subsubsection @acronym{SHAKE-256} +@subsubsection @acronym{SHAKE-128} @cindex SHAKE

I think heading should be just "shake".

Done.

...

...
-In addition to those SHA-3 hash functions, Nettle also provides a SHA-3 -extendable-output function (XOF), SHAKE-256. Unlike SHA-3 hash functions, -SHAKE can produce an output digest of any desired length. +In addition to those SHA-3 hash functions, Nettle also provides a +SHA-3 extendable-output function (XOF) called SHAKE. Unlike hash +functions, SHAKE can produce an output digest of any desired +length. There are two variants, SHAKE-128 and SHAKE-256, with +different security strengths in terms of collision or preimage +resistance.

+SHAKE-128 internally uses a SHA-3 hash function with 128-bit security +strength against second preimage attacks. The hash function is not +usable alone with Nettle, only for the use with SHAKE-128.

I think it would be good to write in the intro that shake-256 corresponds to sha3-256, while shake-128 uses sha3 with parameters corresponding to 128-bit security, for which there's no corresponding plain hash function defined.

It might also make sense to explain the difference between _shake and _shake_output functions here, and make the description under each function a bit shorter.

Yes, I've consolidated the description and put it at the introduction.

Regards,

-- Daiki Ueno

Niels Möller

9:07 a.m.

Daiki Ueno ueno@gnu.org writes:

...

Yes, I've consolidated the description and put it at the introduction.

Thanks, merged now! /Niels

-- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance.

481

Age (days ago)

530

Last active (days ago)

nettle-bugs@lists.lysator.liu.se

16 comments

2 participants

tags (0)

participants (2)

Daiki Ueno
Niels Möller