On Sat, Jun 20, 2020 at 11:54 AM Niels Möller nisse@lysator.liu.se wrote:
Have you measured speedup when going from 4 to 8 blocks? We shouldn't add larger loops than needed.
The 8x loop has x~1.15 performance boost over 4x loop, if you think it's not worth it, I can add only 4x loop to make the code simpler.
Do you measure a speedup from this? Karatsuba usually pays off only for a bit larger sizes (but I guess overhead is a little less here than for standard multiplication).
Actually, I considered the Karatsuba algorithm not only for performance but to reduce the number of registers used. However, I believe that using the Karatsuba algorithm in my case performs better or similar to classical multiplication.
- Since the functionality of gcm_set_key() is replaced with
gcm_init_key() for PowerPC64LE, two warnings will pop up:
[‘gcm_gf_shift’
defined but not used] and [‘gcm_gf_add’ defined but not used]
When I applied the patch to the last upstream, these warnings did not appear, some changes have occurred to gcm.c, I will look at it.
To test PPC code, I wonder if it's easy to add a PPC build to .gitlab-ci, in the same way as arm and mips tests. These are based on Debian packaged cross compilers and qemu-user. I'm also not that familiar with the variants within the Power and PowerPC family of processors.
I will see what I can do.