Maamoun TK maamoun.tk@googlemail.com writes:
Sure. According to Power ISA 2.07: The lvsl and lvsr instructions can be used to create the permute control vector to be used by a subsequent vperm instruction.
So the lvsl and lvsr instructions check 'sh' value in order to fill the vector register, if 'sh' is 0 the vector register will be populated as follow 0x000102030405060708090A0B0C0D0E0F this can be done using the following instructions li r9, 0 lvsl LE_MASK, r9, r9 Now we xor each byte with 3 using these instructions vspltisb LE_TEMP, 0x03 vxor LE_MASK, LE_MASK, LE_TEMP The value of the vector register is now 0x03020100070605040B0A09080F0E0D0C
Thanks. I've added some comments about this.
I've also extended the fat setup to check for altivec, using the logic
hwcap = getauxval(AT_HWCAP); ... /* We also need VSX instructions, mainly for load and store. */ features->have_altivec = ((hwcap & (PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_VSX)) == (PPC_FEATURE_HAS_ALTIVEC | PPC_FEATURE_HAS_VSX));
For now, gnu/linux only, patches to get detection working also on freebsd and aix welcome (I think needed fixes will be close to trivial, but I have no easy way to test, and I don't want to commit untested code).
For non-fat builds, the new code is disabled by default, with a configure option --enable-power-altivec.
And I've merged the changes to the master branch. I have some work-in progress code to do 2 or 4 chacha blocks in parallel, but not sure when I will get that into working shape.
Regards, /Niels