Sure. According to Power ISA 2.07: The lvsl and lvsr instructions can be used to create the permute control vector to be used by a subsequent vperm instruction.
So the lvsl and lvsr instructions check 'sh' value in order to fill the vector register, if 'sh' is 0 the vector register will be populated as follow 0x000102030405060708090A0B0C0D0E0F this can be done using the following instructions li r9, 0 lvsl LE_MASK, r9, r9 Now we xor each byte with 3 using these instructions vspltisb LE_TEMP, 0x03 vxor LE_MASK, LE_MASK, LE_TEMP The value of the vector register is now 0x03020100070605040B0A09080F0E0D0C If this mask has been used in vperm instruction, that means each word in the source vector will be byte reversed so in the big-endian mode every word of the result will be stored in the destination buffer in little-endian order and that what LE_SWAP32 is meant to do.
On Mon, Sep 28, 2020 at 8:32 PM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
The last patch follows the C implementation but I just figured out a
decent
way to do it.
Thanks! Applied, and pushed on the ppc-chacha-core branch for testing. (Had apply it semi-manually, since the file to patch indents using TAB and those were replaced by spaces in the emailed patch).
+IF_BE(`
- li r9, 0
- lvsl LE_MASK, r9, r9
- vspltisb LE_TEMP, 0x03
- vxor LE_MASK, LE_MASK, LE_TEMP
+')
I think this deserves some comments, on what goes into the register in each step. Clever that the endian conversion corresponds to xoring the byte indices with 3.
Regards, /Niels
-- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance.