I've now pushed in some configure tweaks to support arm (more specifically, configure recognizes armv7l*.
As a warmup, I implemented memxor in arm assembly. Seems to work, and it's a modest performance improvement over the C code (in the aligned case, both memxor and memxor3 do 0.75 cycles per byte when I benchmark on Cortex-A9). Could surely be further improved.
Regards, /Niels