nettle-bugs December 2023

nettle-bugs@lists.lysator.liu.se

5 participants
7 discussions

Add RSA-OAEP encryption/decryption to Nettle
by Nicolas Mora 09 Mar '24

09 Mar '24

Hello, I've made a new Merge Request in the nettle gitlab repo to provide RSA-OAEP encryption and decryption: https://git.lysator.liu.se/nettle/nettle/-/merge_requests/20 It adds 2 new functions: int pkcs1_oaep_encrypt (size_t key_size, void *random_ctx, nettle_random_func *random, size_t hlen, size_t label_length, const uint8_t *label, size_t message_length, const uint8_t *message, mpz_t m); int pkcs1_oaep_decrypt (size_t key_size, const mpz_t m, size_t hlen, size_t label_length, const uint8_t *label, size_t *length, uint8_t *message); The parameter hlen is the output length of the SHA function used for masking data: - SHA1_DIGEST_SIZE - SHA256_DIGEST_SIZE - SHA384_DIGEST_SIZE - SHA512_DIGEST_SIZE Is it possible to get feedback for this MR and eventually push it to the master branch? Thanks in advance /Nicolas

4 21

Re: ppc64: v2, AES/GCM Performance improvement with stitched implementation
by Danny Tsen 15 Jan '24

15 Jan '24

Hi Niels, Here is the version 2 for AES/GCM stitched patch. The stitched code is in all assembly and m4 macros are used. The overall performance improved around ~110% and 120% for encrypt and decrypt respectably. Please see the attached patch and aes benchmark. Thanks. -Danny > On Nov 22, 2023, at 2:27 AM, Niels Möller <nisse(a)lysator.liu.se> wrote: > > Danny Tsen <dtsen(a)us.ibm.com> writes: > >> Interleaving at the instructions level may be a good option but due to >> PPC instruction pipeline this may need to have sufficient >> registers/vectors. Use same vectors to change contents in successive >> instructions may require more cycles. In that case, more >> vectors/scalar will get involved and all vectors assignment may have >> to change. That’s the reason I avoided in this case. > > To investigate the potential, I would suggest some experiments with > software pipelining. > > Write a loop to do 4 blocks of ctr-aes128 at a time, fully unrolling the > round loop. I think that should be 44 instructions of aes mangling, plus > instructions to setup the counter input, and do the final xor and > endianness things with the message. Arrange so that it loads the AES > state in a set of registers we can call A, operating in-place on these > registers. But at the end, arrange the XORing so that the final > cryptotext is located in a different set of registers, B. > > Then, write the instructions to do ghash using the B registers as input, > I think that should be about 20-25 instructions. Interleave those as > well as possible with the AES instructions (say, two aes instructions, > one ghash instruction, etc). > > Software pipelining means that each iteration of the loop does aes-ctr > on four blocks, + ghash on the output for the four *previous* blocks (so > one needs extra code outside of the loop to deal with first and last 4 > blocks). Decrypt processing should be simpler. > > Then you can benchmark that loop in isolation. It doesn't need to be the > complete function, the handling of first and last blocks can be omitted, > and it doesn't even have to be completely correct, as long as it's the > right instruction mix and the right data dependencies. The benchmark > should give a good idea for the potential speedup, if any, from > instruction-level interleaving. > > I would hope 4-way is doable with available vector registers (and this > inner loop should be less than 100 instructions, so not too > unmanageable). Going up to 8-way (like the current AES code) would also > be interesting, but as you say, you might have a shortage of registers. > If you have to copy state between registers and memory in each iteration > of an 8-way loop (which it looks like you also have to do in your > current patch), that overhead cost may outweight the gains you have from > more independence in the AES rounds. > > Regards, > /Niels > > -- > Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. > Internet email is subject to wholesale government surveillance.

2 6

Re: Deleting obsolete assembly files?
by Niels Möller 08 Dec '23

08 Dec '23

nisse(a)lysator.liu.se (Niels Möller) writes: > Simon Josefsson <simon(a)josefsson.org> writes: > >> Also, remember that Niels proposal is not about removing these >> algorithms, just dropping the assembler variant. So they will continue >> to work fine on these platforms, but will take advantage of more code >> scrutiny. I think that is a reasonable trade-off. > And the only architectures that currently have any md5 assembly is x86 > and x86_64. On my x86_64 laptop, I see a rather modest performance gain > of about 6% over the C version. I don't expect anyone willing to work on > improved md5 performance, on x86_64 or on additional platforms. Getting back to this thread. I've pushed a change to delete md5 assembly on branch delete-md5-asm, for testing. I don't think carrying md5 assembly code is worth the complexity. The arcfour assembly was deleted in the 3.9 release. Deletion candidates remaining: 32-bit x86 (aes (non-aesni), sha1, camellia). 32-bit sparc. 32-bit ARM prior to ARMv6. Possibly also 64-bit sparc; currently, only sparc64-assembly is for aes, written in 2007 based on the sparc32 code. So unclear how relevant it is for current sparc processors). Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance.

1 0

Re: [PATCH] Add DRBG-CTR-AES256.
by Niels Möller 06 Dec '23

06 Dec '23

Simon Josefsson <simon(a)josefsson.org> writes: > Please release 3.9 before looking at this! :-) > > This adds DRBG-CTR-AES256, what do you think? I've merged this onto a branch add-drbg-ctr-aes256. I've made some additional changes: use union nettle_block16 where that made sense, rename Key -> key, fixed typo in testsite/Makefile, and extracted the output logic to its own helper function. It could be optimized to call aes256_encrypt with more than one block at a time, when possible, but probably not worth the extra complexity. Please have a look. For your sntrup761 patch that depends on this, will you be doing any more work on that in the near future? In the meantime, I've reworked the testing for side-channel silence, so it should be rather straight-forward to add such tests for sntrup761. Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance.

1 0

Patch to detect on CPU capabilities on Apple Silicon
by Tim Kosse 05 Dec '23

05 Dec '23

Hi, after building Nettle natively for Apple devices running Apple Silicon, I noticed a drastic performance different between the native build, and an x86_64 build emulated via Apple's Rosetta. The latter, despite emulation, was over 10 times faster in some algorithms, e.g. AES-128-GCM. I found out that get_arm64_features did not at all detect the CPU capabilities on Apple devices. The attached patch fixes the issue for me, with it in place the CPU features are correctly detected. AES-128-GCM benchmark results: Native (pre-patch): 200MB/s Emulated: 3.2GB/s Native (patched): 5.2GB/s Regards, Tim Kosse

2 2

Re: Mailing list archive is not working
by Niels Möller 05 Dec '23

05 Dec '23

Justus Winter <justus(a)sequoia-pgp.org> writes: > https://lists.lysator.liu.se/mailman/hyperkitty/list/nettle-bugs@lists.lysa… > > shows zero mails this year. Not sure where to raise that, so I'm > raising this here. I've asked mail admins. It turned out that the integration between mailman and hyperkitty was overlooked when the system was upgraded to mailman3 one and a half year ago. And it seems you are the first(!) user reporting that it's broken. Thanks for reaching out. As of today, archives are finally receiving new mail, but unfortunately it seems traffic since the upgrade until now isn't archived anywhere near the list server. Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance.

1 0

Re: How to update OpenSSL benchmark glue?
by Amos Jeffries 05 Dec '23

05 Dec '23

On 4/12/23 09:05, Niels Möller wrote: > Simo Sorce writes: > >> Ah you do not need to pass any property for the default provider so you >> can pass "" or even NULL. > > Thanks, I now have the RSA code updated (on branch update-openssl-bench, > if anyone wants to see the details). Initialization is now > > ctx->pkey_ctx = EVP_PKEY_CTX_new_from_name (NULL, "RSA", ""); > if (!ctx->pkey_ctx) > die ("OpenSSL EVP_PKEY_CTX_new_from_name (\"RSA\") failed.\n"); FWIW, In Squid with OpenSSLv3 we use this: EVP_PKEY_CTX_new_id(EVP_PKEY_RSA, NULL) > if (EVP_PKEY_keygen_init (ctx->pkey_ctx) <= 0) > die ("OpenSSL EVP_PKEY_keygen_init failed.\n"); > if (EVP_PKEY_CTX_set_rsa_keygen_bits(ctx->pkey_ctx, size) <= 0) > die ("OpenSSL EVP_PKEY_CTX_set_rsa_keygen_bits failed.\n"); > BIGNUM *e = BN_new(); > BN_set_word(e, 65537); > EVP_PKEY_CTX_set1_rsa_keygen_pubexp (ctx->pkey_ctx, e); > EVP_PKEY_keygen (ctx->pkey_ctx, &ctx->key); > > However, when I run this under valgrind (to check the corresponding > cleanup code doesn't leak memory), I get an error: > > ==3016684== Conditional jump or move depends on uninitialised value(s) > ==3016684== at 0x4B0B824: EVP_PKEY_generate (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3) > ==3016684== by 0x10F30A: bench_openssl_rsa_init (hogweed-benchmark.c:721) > ==3016684== by 0x10D7AE: bench_alg (hogweed-benchmark.c:153) > ==3016684== by 0x10D7AE: main (hogweed-benchmark.c:972) > ==3016684== > > I wonder if that my code missing some initialization, or if that's an > openssl problem? How was the "ctx" variable created and initialized? The new EVP_PKEY logic has a lot of "ctx_is_legacy" checks based on the ctx itself. So that matters now where it did not before. > It's also unclear to me when the e bignum above can be > deallocated, does EVP_PKEY_CTX_set1_rsa_keygen_pubexp imply a full copy > into the context? Quick reading of the source code indicates that yes the context used BN_dup() one way or another. > > Next is updating the ecdsa benchmarks, since, e.g., > EC_KEY_new_by_curve_name, generates deprecation warnings. > > Regards, > /Niels > HTH Amos

3 2

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

nettle-bugs December 2023