Hi,
I'm having a new look at sntrup761, I have rebased the branch based on Simon's work, and pushed as branch "sntrup761" in the Nettle repository. And I've reread https://ntruprime.cr.yp.to/nist/ntruprime-20201007.pdf (is that still the main spec?).
I've also added valgrind-based tests for side-channels. It appears key generation may have leaks (when I mark the output from the randomness generator as secret). Maybe this is just rejection of certain samples, which should not be a problem (for this sampling strategy, it's expected to leak the number of tries needed). Encapsulation appears to not have branches or memory accesses depending on the randomness input. Decapsulation appears to have no branches or memory accesses depending on the secret key, which is the most important property.
I don't yet quite understand the implementation. Some issues:
* Not entirely sure where the sorting comes from (I saw no mention of it in the spec). I imagine it's part of generating random values of the appropriate types.
* The encode/decode step appear to follow the spec closely, but to me it's a bit weird to use the M arrays filled with constant values.
* Coding style is a bit odd, e.g., with long long type for values that always appear to always be small constant, short lowercase names like "p" used for preprocessor constants.
I think it should be doable to get into good shape.
Regards, /Niels
Niels Möller nisse@lysator.liu.se writes:
Hi,
I'm having a new look at sntrup761, I have rebased the branch based on Simon's work, and pushed as branch "sntrup761" in the Nettle repository.
Yay! I had forgotten about that. IIRC it was based on OpenSSH extraction from supercop, but I think it should be updated against latest upstream -- https://libntruprime.cr.yp.to/download.html -- although I won't be able to work on it for the next few weeks, so if you happen to have cycles upgrading it would be great.
And I've reread https://ntruprime.cr.yp.to/nist/ntruprime-20201007.pdf (is that still the main spec?).
Yes.
I've also added valgrind-based tests for side-channels. It appears key generation may have leaks (when I mark the output from the randomness generator as secret).
I think this was fixed in latest upstream, or I confuse it with something that sounded similar.
- Not entirely sure where the sorting comes from (I saw no mention of it in the spec). I imagine it's part of generating random values of the appropriate types.
Sorting happens during key generation, as part of the (Hash)Shorts conversion (see section 3.3, on lprime, which somewhat confusingly is re-used by sntrup too). I suspect https://sorting.cr.yp.to/ eventually finds it way here too, there is a very recent page with speed comparisons: https://sorting.cr.yp.to/speed.html
/Simon
Simon Josefsson simon@josefsson.org writes:
Yay! I had forgotten about that. IIRC it was based on OpenSSH extraction from supercop, but I think it should be updated against latest upstream -- https://libntruprime.cr.yp.to/download.html --
That's a lot of code... libntruprime-20241021.tar.gz is about the same size as a nettle release, and contains some 1800 files. Are you familiar with the organization? It seems non-trivial to map to the functions in sntrup761.c in the Nettle branch and figure out what should be updated.
Regards, /Niels
I've been hacking a bit on the sntrup code in recent days. See branch https://git.lysator.liu.se/nettle/nettle/-/tree/sntrup761
I have a couple of questions, both on api details and on the algorithm.
1. According to the spec, the secret key includes a copy of the public key. Should we stick to this for nettle's api, or would it make more sense to have the decrypt function taking a separate pubkey argument?
2. Would it be useful with an api where public and private keys can be decoded from byte strings to an internal representation, to not have to redo the decoding on each operation? Or stick to only bytestring input and output?
3. For private key decoding, it may happen that the private key is invalid. The private key includes lots of mod 3 coefficients, where each coefficient is represented as two bits with valid values 00, 01, 10. What if caller passes 11? Hitting some undefined behaviour or an assertion failure for this isn't that nice. One could return an error (which would be somewhat natural if one has a private key decode function as above), or silently replace 11 values with 00? Neither the spec or the current code includes any error handling as far as I've found.
4. I'm confused by the extra steps taken to get "implicit rejection". Decryption may fail if fed a random (or maliciously chosen) ciphertext. It makes intuitive sense to me to hide all details of where during decryption something looked wrong. But why go to extra effort to not return any error at all? When the system is attacked, why does it matter if the attacker gets an explicit error at key exchange time, or an error (typically a MAC or AEAD decrypt failure) at first use of the resulting session key? I understand argument is likely subtle, but I'd appreciate any layman explanation for this design.
Regards, /Niels
Hi Niels,
On Apr 8, 2026 at 5:00:44 AM, Niels Möller nisse@lysator.liu.se wrote:
I've been hacking a bit on the sntrup code in recent days. See branch https://git.lysator.liu.se/nettle/nettle/-/tree/sntrup761
I have a couple of questions, both on api details and on the algorithm.
- According to the spec, the secret key includes a copy of the public
key. Should we stick to this for nettle's api, or would it make more sense to have the decrypt function taking a separate pubkey argument?
When generating a new keypair, I think you’ll want the API to return both the private key and public key. Both of these can be encoded as byte strings. You’ll want both keys to be returned here, as typically the owner of the keypair needs to share the public key in some way with whoever wants to send a secret.
When decrypting, you don’t need the public key. The decrypt call takes ciphertext and a private key and returns the original shared secret, assuming it was encrypted using the corresponding public_key.
The public key is used in the encrypt phase. It takes a shared secret and a public key and returns ciphertext when can be decrypted using the private key.
2. Would it be useful with an api where public and private keys can be
decoded from byte strings to an internal representation, to not have to redo the decoding on each operation? Or stick to only bytestring input and output?
I would stick to byte strings here. It’d be messy to be converting back and forth between the two representations, and you will need the byte string version to be able to send it over the network.
3. For private key decoding, it may happen that the private key is
invalid. The private key includes lots of mod 3 coefficients, where each coefficient is represented as two bits with valid values 00, 01, 10. What if caller passes 11? Hitting some undefined behaviour or an assertion failure for this isn't that nice. One could return an error (which would be somewhat natural if one has a private key decode function as above), or silently replace 11 values with 00? Neither the spec or the current code includes any error handling as far as I've found.
I haven’t looked at the internals, but if a caller passes in an invalid key (public or private), I would recommend returning an error and not trying to translate the key bytes into a “valid” key. I’m not sure it is even possible for a caller with an invalid public key and no access to the private key to figure out what the public key was supposed to be.
4. I'm confused by the extra steps taken to get "implicit rejection".
Decryption may fail if fed a random (or maliciously chosen) ciphertext. It makes intuitive sense to me to hide all details of where during decryption something looked wrong. But why go to extra effort to not return any error at all? When the system is attacked, why does it matter if the attacker gets an explicit error at key exchange time, or an error (typically a MAC or AEAD decrypt failure) at first use of the resulting session key? I understand argument is likely subtle, but I'd appreciate any layman explanation for this design.
In the case of a bad key, the decrypt call probably wouldn’t be able to return any plaintext, so I think it pretty much has to return an error in this case to indicate that the plaintext buffer is not valid. We wouldn’t want the caller to try and access uninitialized memory.
To summarize, I think the API should look something like:
keypair() -> (private_key, public_key) encaps(shared_secret, public_key) -> ciphertext decaps(ciphertext, private_key) -> shared_secret
Buffers will need to be allocated to an appropriate size depending on the algorithm, and in C I would expect a mix of “in” and “out” parameters passed as arguments, probably with the return value of all the functions being an error enum indicating any error which occurred.
Here’s an example C API, modeled after liboqs:
#define sntrup761_length_public_key 1158 #define sntrup761_length_private_key 1763 #define sntrup761_length_ciphertext 1039 #define sntrup761_length_shared_secret 32
STATUS sntrup761_keypair(uint8_t *public_key, uint8_t *private_key);
STATUS sntrup761_encaps(uint8_t *ciphertext, uint8_t *shared_secret, const uint8_t *public_key);
STATUS sntrup761_decaps(uint8_t *shared_secret, const uint8_t *ciphertext, const uint8_t *private_key);
With the exception of the function names and buffer sizes, this exact same API should be usable for the various flavors of ML-KEM as well. To see an example of this in actual use, you can take a look at the PQ key exchange code from AsyncSSH at https://github.com/ronf/asyncssh/blob/develop/asyncssh/crypto/pq.py. -- Ron Frederick ronf@timeheart.net
Ron Frederick ronf@timeheart.net writes:
When generating a new keypair, I think you’ll want the API to return both the private key and public key.
Agreed, and that's how key generation in Nettle works generally (with some exceptions, e.g., for ed25519 where private key is an arbitrary random octet string without any structure, there's only a function to compute corresponding public key).
In this case, what's a bit weird is that the private key includes a literal copy of the public key. And it's not so nice to force applications to allocate space for it twice.
When decrypting, you don’t need the public key. The decrypt call takes ciphertext and a private key and returns the original shared secret, assuming it was encrypted using the corresponding public_key.
Some other nettle functions using the private key takes the corresponding public key as an additional argument. E.g.,
void ed25519_sha512_sign (const uint8_t *pub, const uint8_t *priv, size_t length, const uint8_t *msg, uint8_t *signature);
I think that makes sense too, when the private key operation needs the public key to do its work.
- Would it be useful with an api where public and private keys can be
decoded from byte strings to an internal representation, to not have to redo the decoding on each operation? Or stick to only bytestring input and output?
I would stick to byte strings here. It’d be messy to be converting back and forth between the two representations, and you will need the byte string version to be able to send it over the network.
I agree that makes sense, at least for the first version.
- For private key decoding, it may happen that the private key is
invalid. The private key includes lots of mod 3 coefficients, where each coefficient is represented as two bits with valid values 00, 01, 10. What if caller passes 11? Hitting some undefined behaviour or an assertion failure for this isn't that nice. One could return an error (which would be somewhat natural if one has a private key decode function as above), or silently replace 11 values with 00? Neither the spec or the current code includes any error handling as far as I've found.
I haven’t looked at the internals, but if a caller passes in an invalid key (public or private), I would recommend returning an error and not trying to translate the key bytes into a “valid” key.
By the spec, the decoding functions applied to public keys and ciphertexts silently ignores errors. Concretely, when it decodes a couple of bits that are expected to represent a number r in the range 0 <= r < m, but in fact r >= m, it silently replaces r by r mod m.
I'm not sure this is great, but this part of the spec is pretty clear.
But no mention of how to deal with analogous errors when decoding a private key. Replacing bits 11 by 00 (reduce mod 3) would be somewhat consistent.
- I'm confused by the extra steps taken to get "implicit rejection".
Decryption may fail if fed a random (or maliciously chosen) ciphertext. It makes intuitive sense to me to hide all details of where during decryption something looked wrong. But why go to extra effort to not return any error at all? When the system is attacked, why does it matter if the attacker gets an explicit error at key exchange time, or an error (typically a MAC or AEAD decrypt failure) at first use of the resulting session key? I understand argument is likely subtle, but I'd appreciate any layman explanation for this design.
In the case of a bad key, the decrypt call probably wouldn’t be able to return any plaintext, so I think it pretty much has to return an error in this case to indicate that the plaintext buffer is not valid. We wouldn’t want the caller to try and access uninitialized memory.
There's no returned error. The definition of the private key includes an extra random value "rho", and in case decryption fails, the function returns a session key that is essentially the hash of rho and the input ciphertext. rho is not used for anything else. It's the value and purpose of this complication that has me a bit puzzled.
To summarize, I think the API should look something like:
keypair() -> (private_key, public_key) encaps(shared_secret, public_key) -> ciphertext
This is close to what the spec calls sntrup "core", but the advertised sntrup KEM doesn't take a shared secret as input, it generates a random input to the encapsulation procedure, and outputs a short session key that is the hash of that random input and other values. I think that's how other KEMs work too (and what makes the KEM interface different from traditional public key encryption)?
#define sntrup761_length_public_key 1158 #define sntrup761_length_private_key 1763
Private key size is another interesting detail.
As I read the spec, an sntrup761 private key consists of, in order:
* encodings of two polynomials with -1,0,1 coefficients (191 bytes each) * a copy of the public key, 1158 bytes (encoding a polynomial with larger coefficients) * the rho value mentioned above, another 191 bytes.
For a total of 1731 = 1158 + 3*191 bytes.
But then the current code appends another 32 bytes, which are just a hash of the public key. The only purpose seems to be to avoid a hash operation when the key is used, which seems like a rather questionable optimization to me. So we have a key size that is 1731 bytes according to spec, but 1763 in existing implementations?
Regards, /Niels
On 09/04/2026 18:23, Niels Möller wrote:
Ron Frederick ronf@timeheart.net writes:
When generating a new keypair, I think you’ll want the API to return both the private key and public key.
Agreed, and that's how key generation in Nettle works generally (with some exceptions, e.g., for ed25519 where private key is an arbitrary random octet string without any structure, there's only a function to compute corresponding public key).
In this case, what's a bit weird is that the private key includes a literal copy of the public key. And it's not so nice to force applications to allocate space for it twice.
Perhapse on generation, do not require the public key to be allocated, and set it to reference the copy inside the private key on return.
Or is there are security issue with the private key being discoverable from that public key pointer?
/2c AYJ
nettle-bugs@lists.lysator.liu.se