Thank you for your work.
On POWER9 I got the following benchmark result:
./configured: chacha encrypt 308.58 chacha decrypt 325.87 ./configured --enable-power-altivec "master branch": chacha encrypt 342.15 chacha decrypt 356.24 ./configured --enable-power-altivec "ppc-chacha-2core": chacha encrypt 648.97 chacha decrypt 648.00
It's gotten better with every further optimization on the core, great work.
regards, Mamone
On Mon, Nov 23, 2020 at 6:50 PM Niels Möller nisse@lysator.liu.se wrote:
Niels Möller nisse@lysator.liu.se writes:
It could likely be speedup further by processing 2, 3 or 4 blocks in parallel.
I've given 2 blocks in parallel a try, but not quite working yet. My work-in-progress code below.
I've got it into working shape now, at least for little-endian. See
https://git.lysator.liu.se/nettle/nettle/-/blob/ppc-chacha-2core/powerpc64/p...
Next steps:
Fix it to work also for big-endian,
Wire it up for fat builds.
Try out if 4-way gives additional speedup.
Benchmarking is appreciated. Compare the master branch to the ppc-chacha-2core branch, configured with --enable-power-altivec, and run ./examples/nettle-benchmark chacha.
Regards, /Niels
-- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ nettle-bugs mailing list nettle-bugs@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs