Michael Weiser michael.weiser@gmx.de writes:
This is where after a lot of scratching of my thinking cap I got to the conclusion that in LE we're actually working with the least significant bytes of r4 at the low end of the register.
My understanding of LE here is that the least significant CNT bits (first in memory) where processed earlier, and the remaining TNC bits we need to process are at the high end. So we first shift right CNT bits to discard the bits already processed, and then do eight bit a a time, shifting right in the loop.
My guess is that it's just a matter of interpretation what end of the register is "high" and most significant.
The way I think about it, a 32-bit register holds a binary integer. Each bit has a weight, from 1 to 2^31 (let's stick to unsigned interpretation). And then I try to stay with the widely used convention that bits are numbered 0-31, with bit k having weight 2^k, and with "right", "lower", "less significant", being synonymous, all meaning the bits with smaller weight, and "left", "higher", "more significant" meaning bits with higher weight.
Note this termonology is endian independent. It's only when storing the integer in memory on a byte-addressed machine, that it becomes relevant to ask which 8 bits get stored at which address, with "little-endian" meaning that the lower/right/less significant bits in the register get stored at lower addresses in memory.
Maybe it's illuminating to compare with bit order. On a byte-addressed machine, we can't really talk about in which "order" individual bits of a byte are stored in memory. But we'd have to care about bit order, e.g., if transmitting a byte serially over a wire.
Long story short: Let me know what to do with those comments based on how much my thinking is off.
You could maybe just say "We have TNC/8 left-over bytes in r4 (high end if little endian, low end if big endian)". Or feel free to rephrase.
Also, I see the final
b .Lmemxor_bytes
is slightly suboptimal, in that it will reread individual bytes from the word at DST. It might be better to check if N > TNC/8, and if so read and xor one more source word.
ldr r4, [SRC] eor r3, r4, S1ADJ TNC
And we can then have the byte-storing loop run until N == 0, without updating or checking TNC. But that's a separate improvement.
Regards, /Niels