On Tue, Sep 13, 2011 at 10:14 AM, Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com wrote:
Corrected figures for nettle-benchmark. My previous issue seems to have been because a new ./configure doesn't really undo the previous settings. SSE is faster than the previous implementations (asm or C), but ASM performs better than C in the unaligned case. I cannot figure out why my benchmark shows otherwise (our unaligned test seem to be pretty much identical). I include the overhead that you subtract, but seems to be identical in both cases.
* ASM benchmark call overhead: 0.001862 us 5.46 cycles
Algorithm mode Mbyte/s cycles/byte cycles/block memxor aligned 11980.56 0.23 1.87 memxor unaligned 11269.30 0.25 1.98
* C implementation: benchmark call overhead: 0.001875 us 5.49 cycles
Algorithm mode Mbyte/s cycles/byte cycles/block memxor aligned 11777.25 0.24 1.90 memxor unaligned 7794.15 0.36 2.87
* SSE2 benchmark call overhead: 0.001868 us 5.47 cycles
Algorithm mode Mbyte/s cycles/byte cycles/block memxor aligned 15961.09 0.18 1.40 memxor unaligned 15882.32 0.18 1.41
regards, Nikos