On Thu, Sep 24, 2020 at 3:46 PM Niels Möller nisse@lysator.liu.se wrote:
I'm trying to learn a bit of ppc assembly. Below is an implementation of _chacha_core. Seems to work, when tested on gcc112.fsffrance.org (just put the file in the powerpc64 directory and reconfigure). This machine is little-endian, I haven't yet tested on big-endian.
Unfortunately I don't get any accurate benchmark numbers on that machine, but I think speedup may be on the order of 50%...
Yeah, getting accurate benchmark results is difficult on the compile farm. First, you need to moves the machines into performance mode but you can't because you're not an admin. (A script like https://github.com/weidai11/cryptopp/blob/master/TestScripts/governor.sh will do if you are admin).
Second, the ISA seems to produce random looking benchmark results. I've never been able to identify good access patterns to produce consistent results. Part of this problem may be powersave mode. Part of it may be mistakes on my part.
Third, to develop somewhat consistent benchmark statistics, repeat the benchmark several times and discard the outliers. I discard both low- and high-outliers. (The low- outliers may be valid, but I discard them anyway).
Also see "GCC135/Power9 performance?", https://lists.tetaneutral.net/pipermail/cfarm-users/2020-April/000556.html. Andy Polyakov joins the conversation and provides his insights.
Jeff