Hello Niels,
This is where after a lot of scratching of my thinking cap I got to the conclusion that in LE we're actually working with the least significant bytes of r4 at the low end of the register.
My understanding of LE here is that the least significant CNT bits (first in memory) where processed earlier, and the remaining TNC bits we need to process are at the high end. So we first shift right CNT bits to discard the bits already processed, and then do eight bit a a time, shifting right in the loop.
Okay, so the comment is referring to the situation in the register literally just before the next instruction. I was clearly looking too far afield for clues on what's going on.
My guess is that it's just a matter of interpretation what end of the register is "high" and most significant.
The way I think about it, a 32-bit register holds a binary integer. Each bit has a weight, from 1 to 2^31 (let's stick to unsigned interpretation). And then I try to stay with the widely used convention that bits are numbered 0-31, with bit k having weight 2^k, and with "right", "lower", "less significant", being synonymous, all meaning the bits with smaller weight, and "left", "higher", "more significant" meaning bits with higher weight.
Note this termonology is endian independent. It's only when storing the integer in memory on a byte-addressed machine, that it becomes relevant to ask which 8 bits get stored at which address, with "little-endian" meaning that the lower/right/less significant bits in the register get stored at lower addresses in memory.
Maybe it's illuminating to compare with bit order. On a byte-addressed machine, we can't really talk about in which "order" individual bits of a byte are stored in memory. But we'd have to care about bit order, e.g., if transmitting a byte serially over a wire.
Thanks for indulging me. :) I had a similar understanding and treatise to that effect all typed up but then found ARMs view of the world in their ARM and felt stupid.
Long story short: Let me know what to do with those comments based on how much my thinking is off.
You could maybe just say "We have TNC/8 left-over bytes in r4 (high end if little endian, low end if big endian)". Or feel free to rephrase.
I've gone with:
diff --git a/arm/memxor.asm b/arm/memxor.asm index 239a4034..e4619629 100644 --- a/arm/memxor.asm +++ b/arm/memxor.asm @@ -138,24 +138,25 @@ PROLOGUE(nettle_memxor) adds N, #8 beq .Lmemxor_odd_done
- C We have TNC/8 left-over bytes in r4, high end + C We have TNC/8 left-over bytes in r4, high end on LE and low end on + C BE, excess bits to be discarded by alignment adjustment at the other S0ADJ r4, CNT + C now byte-aligned at low end on LE and high end on BE ldr r3, [DST] eor r3, r4
Patch in followup mail for your consideration.