I created merge requests that have improvements of Chacha20 for arm64 and s390x architectures by following the approach used in powerpc implementation. https://git.lysator.liu.se/nettle/nettle/-/merge_requests/37 https://git.lysator.liu.se/nettle/nettle/-/merge_requests/40 The patches have 80.85% speedup for arm64 arch and 284.79% speedup for s390x arch.
It would be nice if the arm64 patch will be tested on big-endian mode since I don't have access to any big-endian variant for testing.
regards, Mamone
Maamoun TK maamoun.tk@googlemail.com writes:
I created merge requests that have improvements of Chacha20 for arm64 and s390x architectures by following the approach used in powerpc implementation. https://git.lysator.liu.se/nettle/nettle/-/merge_requests/37 https://git.lysator.liu.se/nettle/nettle/-/merge_requests/40 The patches have 80.85% speedup for arm64 arch and 284.79% speedup for s390x arch.
Nice, I've had a quick first look.
It would be nice if the arm64 patch will be tested on big-endian mode since I don't have access to any big-endian variant for testing.
I've merged the arm64 code to a branch, for CI testing.
For the ARM code, which instructions are provided by the asimd extension? Basic simd is always available, if I've understood correctly.
Regards, /Niels
On Wed, Jan 19, 2022 at 8:48 PM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
I created merge requests that have improvements of Chacha20 for arm64 and s390x architectures by following the approach used in powerpc implementation. https://git.lysator.liu.se/nettle/nettle/-/merge_requests/37 https://git.lysator.liu.se/nettle/nettle/-/merge_requests/40 The patches have 80.85% speedup for arm64 arch and 284.79% speedup for s390x arch.
Nice, I've had a quick first look.
It would be nice if the arm64 patch will be tested on big-endian mode
since
I don't have access to any big-endian variant for testing.
I've merged the arm64 code to a branch, for CI testing.
For the ARM code, which instructions are provided by the asimd extension? Basic simd is always available, if I've understood correctly.
As far as I understand, SIMD is called Advanced SIMD on AArch64 and it's standard for this architecture. simd is enabled by default in GCC but it can be disabled with nosimd option as I can see in here https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html which is why I made a specific config option for it.
regards, Mamone
Regards, /Niels
-- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance.
Maamoun TK maamoun.tk@googlemail.com writes:
As far as I understand, SIMD is called Advanced SIMD on AArch64 and it's standard for this architecture. simd is enabled by default in GCC but it can be disabled with nosimd option as I can see in here https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html which is why I made a specific config option for it.
If it's present on all known aarch64 systems (and HWCAP_ASIMD flag always set), I think we can keep things simpler and use the code unconditionally, with no extra subdir, no fat build function pointers or configure flag.
I've pushed the merge button for the s390x merge request.
Regards, /Niels
On Thu, Jan 20, 2022 at 10:32 PM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
As far as I understand, SIMD is called Advanced SIMD on AArch64 and it's standard for this architecture. simd is enabled by default in GCC but it can be disabled with nosimd option as I can see in here https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html which is why I
made
a specific config option for it.
If it's present on all known aarch64 systems (and HWCAP_ASIMD flag always set), I think we can keep things simpler and use the code unconditionally, with no extra subdir, no fat build function pointers or configure flag.
Ok, I'll commit the changes with vanilla assembly files.
I've pushed the merge button for the s390x merge request.
Nice! I've made various tests on each core function so merging the changes is gonna be ok.
In another topic, I'm making experiments on your poly1305 optimizing tips and I'll get back to you once I'm up to something.
regards, Mamone
Regards,
/Niels
-- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance.
On Thu, Jan 20, 2022 at 11:08 PM Maamoun TK maamoun.tk@googlemail.com wrote:
On Thu, Jan 20, 2022 at 10:32 PM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
As far as I understand, SIMD is called Advanced SIMD on AArch64 and it's standard for this architecture. simd is enabled by default in GCC but it can be disabled with nosimd option as I can see in here https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html which is why I
made
a specific config option for it.
If it's present on all known aarch64 systems (and HWCAP_ASIMD flag always set), I think we can keep things simpler and use the code unconditionally, with no extra subdir, no fat build function pointers or configure flag.
Ok, I'll commit the changes with vanilla assembly files.
Done! The MR is updated https://git.lysator.liu.se/nettle/nettle/-/merge_requests/37
regards, Mamone
I've pushed the merge button for the s390x merge request.
Nice! I've made various tests on each core function so merging the changes is gonna be ok.
In another topic, I'm making experiments on your poly1305 optimizing tips and I'll get back to you once I'm up to something.
regards, Mamone
Regards,
/Niels
-- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance.
nettle-bugs@lists.lysator.liu.se