David Edelsohn dje.gcc@gmail.com writes:
I responded that Power9 (or at least Power8) would be preferred. If Niels wants the implementation to impact production deployments and increase the use of Nettle for cryptography on Power systems, I recommend that he target a more recent level of the ISA. He can target Power7, and Power4, and pure Altivec as well.
The basic chacha code I added some month ago uses altivec instructions, and the instructions lxvw4x and stxvw4x (with vsr registers) for load and store, to make it easier to work with data that is only 32-bit aligned.
I put that code under the powerpc64/p7/ directory, under the belief that the code should work fine for all Power7 and later (with the caveat that I don't know to which degree altivec is an optional feature).
It may also be relavant to note that with the current configure script, no power assembly is used unconditionally by default, it has to be enabled either explicitly with configure arguments, or based on runtine checks, if configured with --enable-fat.
That means that the name of powerpc64/p7/ directory doesn't matter much technically (I would be fine to rename it to, e.g., altivec/). But I got the impression from the list discussion that p7/ was reasonable.
And my intention is that improved chacha code should target the same processor flavors as the existing more basic implementation. So I need to replace the use of the vextractuw (which isn't used in the most performance critical part of the function).
Regards, /Niels