nisse@lysator.liu.se (Niels Möller) writes:
It interleaves the processing of two blocks, which gives a speedup of 50% -- 100% on the ARM cores where I've tested it. Before merging, I need to fix fat builds to use the new code on processors that support it.
I've added the fat build support, which needed a bit of reorganization, and mergged to master. This will break support for big-endian ARM for now, since I'm not able to test that.
Regards, /Niels