From POWER ISA 2.07 B:
- Load Vector Indexed X-form lvx VRT,RA,RB . . VRT ← MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16)
- Load VSX Vector Doubleword*2 Indexed XX1-form lxvd2x XT,RA,RB . . VSR[XT]{0:63} ← MEM(EA,8) VSR[XT]{64:127} ← MEM(EA+8,8)
lxvd2x doesn't set the least significant 4-bit of effective address to zero, this isn't enough to prove that lxvd2x loads unaligned data properly so I tested this instruction to load unaligned data and it did succeed.
Andy Polyakov did use lxvd2x to load user's buffers in his ghash implementation for ppc: https://github.com/dot-asm/cryptogams/blob/master/ppc/ghashp8-ppc.pl He uses lvx_u which indicate to lvx_unaligned I guess, Perl translates this instruction to lxvd2x before passing it to the assembler.
regards, Mamone
On Tue, Jun 30, 2020 at 11:31 PM Jeffrey Walton noloader@gmail.com wrote:
On Tue, Jun 30, 2020 at 7:55 AM Maamoun TK maamoun.tk@googlemail.com wrote:
I tested something similar, I tried to load data at address 0xXXXXXXX1 using lxvd2x and it loaded it properly.
I think you should reconsider.
Consider, even Andy Polyakov uses lvx/lvsl when loading [potentially] unaligned buffers on POWER8: https://github.com/dot-asm/cryptogams/blob/master/ppc/aesp8-ppc.pl. Andy uses it for the user's key and data.
Jeff
On Tue, Jun 30, 2020 at 12:35 PM Jeffrey Walton noloader@gmail.com
wrote:
On Tue, Jun 30, 2020 at 5:29 AM Jeffrey Walton noloader@gmail.com
wrote:
On Tue, Jun 30, 2020 at 5:14 AM Maamoun TK <
maamoun.tk@googlemail.com>
wrote:
Patch implementation benchmark for GCM_AES (Tested on POWER8): little-endian:
- Encrypt x~17.5 of nettle C implementation
- Decrypt x~17.5 of nettle C implementation
- Update x~30 of nettle C implementation
big-endian:
- Encrypt x~18.5 of nettle C implementation
- Decrypt x~18.5 of nettle C implementation
- Update x~28.5 of nettle C implementation
...
One small comment for aes_encrypt and aes_decrypt... src and dst are usually user supplied buffers. Using lxvd2x to load a vector may produce incorrect results if the user is feeding a stream to an encryptor or decryptor that is not naturally aligned to that of an unsigned int. (On the other hand, Nettle controls the round keys
array
so lxvd2x should be fine.)
Instead of lxvd2x and friends for the user's buffers you should consider using lvx and doing the lvsl thing to fix the data in the registers.
In fact, you might want to add a test case like this:
uint8_t plain[19] = {0,1, ..., 17, 18}; uint8_t cipher[16], recover[16];
Then send plain, plain+1, plain+2 and plain+3 into the encryptor and see if it round trips. lxvd2x will choke even on POWER9 because two of the tests will not even be naturally aligned for a byte.