Eric Richter erichte@linux.ibm.com writes:
According to the ABI, the stack pointer is quadword aligned, so starting the stack storage at offset -8, may cause the return address to be stepped on. Adjusting to use -16 as the starting point, which also matches other POWER assembly code.
Thanks, applied!
I've noticed one more memory access issue when re-reading this code. The loading of the input data is done using
define(`LOAD', ` IF_BE(`lxvw4x VSR(IV($1)), $2, INPUT') IF_LE(` lxvd2x VSR(IV($1)), $2, INPUT vperm IV($1), IV($1), IV($1), VT0 ') ') [...] LOAD(0, TC0) LOAD(1, TC4) LOAD(2, TC8) LOAD(3, TC12) [...]
As I understand this, like for the state registers, we only use 32 bits of each of the vector registers representing the input block being expanded (it would be nice if we could find a more compact representation without complicating the input expansion logic, but that may be quite difficult).
So we read the 16 bytes at INPUT into register v16, using the first 4 of those bytes, then the 16 bytes as INPUT+4 into v17, using the first 4 bytes, etc.
So we do overlapping reads, and at the end we'll read 12 bytes beyond the end of the input buffer?
I think it should be possible to replace this with something like
LOAD(0, TC0) vsldoi IV(1), IV(0), IV(0), 4 vsldoi IV(2), IV(0), IV(0), 8 vsldoi IV(3), IV(0), IV(0), 12 LOAD(4, TC16) [...]
Do you agree? We could then eliminate some of the TC registers as well.
Regards, /Niels