Nice to see you get a substantial speed-up too.
/ Jonas Walldén
Previous text:
2004-02-05 14:21: Subject: Nettle
On my laptop (intel P4), I get an increase from 45 MB/s to 66MB/s.
Does it matter if the si, sj are ints or uint8_t? I get no speed difference.
The inner loop gets compiled into (intel, gcc-3.3, -O2)
.L28: incb -13(%ebp) decl %ebx movzbl -13(%ebp), %edx movzbl (%edx,%edi), %ecx addb %cl, -14(%ebp) movzbl -14(%ebp), %eax movzbl (%eax,%edi), %eax movb %al, (%edx,%edi) addb %cl, %al movl 16(%ebp), %edx movzbl %al, %eax movzbl (%eax,%edi), %eax xorb (%esi), %al incl %esi movb %al, (%edx) incl %edx cmpl $-1, %ebx movl %edx, 16(%ebp) jne .L28
It seems it can't fit all variables into registers, hence the save and restore operations via %ebp.
I wonder if my intel books will ever arrive.
/ Niels Möller (vässar rödpennan)