I created merge requests that have improvements of Chacha20 for arm64 and s390x architectures by following the approach used in powerpc implementation. https://git.lysator.liu.se/nettle/nettle/-/merge_requests/37 https://git.lysator.liu.se/nettle/nettle/-/merge_requests/40 The patches have 80.85% speedup for arm64 arch and 284.79% speedup for s390x arch.
It would be nice if the arm64 patch will be tested on big-endian mode since I don't have access to any big-endian variant for testing.
regards, Mamone