Hello Niels,
On Tue, Mar 03, 2020 at 06:57:25PM +0100, Niels Möller wrote:
The correctness in all cases is not that obvious to me now, but the idea is that we write aligned words, and read aligned words. But since input and output may have different alignment, src words are shifted around so that matching bytes are xored together. The ascii art a bit higher up in the file tries to illustrate that.
At this point in the code, r4 holds an aligned word read from the src area (possibly reading to a word edge beyond the end of the input bytes). The first bytes in this word (on LE, those are the least significant "low end" bytes of r4) have already been xored with the previous destination word and stored back. The "left-over" bytes referred to in the comment are the bytes in r4 that have not yet been
Thanks for the explanation. It confirms my understanding so far.
processed, and those are the last bytes, and which in LE are the most significant bytes of r4, located at the "high end" of the register.
This is where after a lot of scratching of my thinking cap I got to the conclusion that in LE we're actually working with the least significant bytes of r4 at the low end of the register. My guess is that it's just a matter of interpretation what end of the register is "high" and most significant. I just want to make sure I have a correct understanding of what the code is doing while messing with it.
Rereading the ARM ARM[1] it says that register contents are little-endian, i.e. the lowest numbered bits being least significant. ([1] D6.3.2) Also, higher bits are left and lower ones are right. ([1] D6.5.3) Byte order is converted only on memory access ([1] A3.3). strb stores the lowest (rightmost) byte of r4 (bits 7..0) to memory ([1] A7.7.160).
This matches what the code is doing: On LE it's saving the lowest byte to DST, incrementing by one (moving upward in memory) and then right-shifts r4 down 8 bits and saves the next byte. So it's saving the least significant byte first which on LE matches how ldr read the word from memory into the register. For validation I'm infering that it must be the least significant (rightmost) byte because lsr is discarding what it shifts out of the register to the right which would result in data loss if strb were to save the highest (leftmost) byte instead.
On BE we now do essentially a rotate-left by 8 bits (by doing a rotate right by 24 bits) to get the highest byte (bits 31..24) down to bits 7..0 of the register while preserving the rest by shifting/rotating them into the upper part (no discard as with lsr). Storing the the most significant byte first again matches how ldr loaded it from memory.
momxor3 seems to do everything the other way around because it works downward in memory.
Sorry if all this seems pedestrian but it isn't my daily fare. And sorry for quoting regulation by paragraph but I really want to make sure that I'm not misunderstanding all this miserably. :)
The C code does something similar, except that I think it avoids reading anything beyond end of input, since that is undefined behavior in C.
Ah, but the C code of memxor works downward in memory, doesn't it?
Long story short: Let me know what to do with those comments based on how much my thinking is off.
[1] ARM armv7-m Architecture Reference Manual https://static.docs.arm.com/ddi0403/e/DDI0403E_B_armv7m_arm.pdf (only one I could find publicly available)