memxor

Nikos Mavrogiannopoulos n.mavrogiannopoulos at gmail.com
Mon Sep 12 13:22:45 CEST 2011


Hello,
 I've run some tests with memxor on a x86-64 machine. My results are:
* C implementation (compiled with gcc 4.4):
        Xoring in chunks of 32768 bytes: done. 50.09 Gb in 5.00 secs:
10.02 Gb/sec
        Xoring (unaligned) in chunks of 32768 bytes: done. 39.90 Gb in
5.00 secs: 7.98 Gb/sec

* ASM implementation:
        Xoring in chunks of 32768 bytes: done. 38.32 Gb in 5.00 secs:
7.66 Gb/sec
        Xoring (unaligned) in chunks of 32768 bytes: done. 30.16 Gb in
5.00 secs: 6.03 Gb/sec

It seems that in x86-64 the ASM version is slower than the C one.
Moreover I noticed that the loop unrolling techniques used in the C
code have no visible performance benefit.

However, an SSE2 version of memxor (attached) increases performance by
30% or more in the same CPU.

* SSE2:
        Xoring in chunks of 32768 bytes: done. 69.94 Gb in 5.00 secs:
13.98 Gb/sec
        Xoring (unaligned) in chunks of 32768 bytes: done. 65.96 Gb in
5.00 secs: 13.19 Gb/sec

regards,
Nikos
-------------- next part --------------
A non-text attachment was scrubbed...
Name: memxor2.c
Type: text/x-csrc
Size: 3141 bytes
Desc: not available
URL: <http://lists.lysator.liu.se/pipermail/nettle-bugs/attachments/20110912/685b598b/attachment.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: speed.c
Type: text/x-csrc
Size: 4428 bytes
Desc: not available
URL: <http://lists.lysator.liu.se/pipermail/nettle-bugs/attachments/20110912/685b598b/attachment-0001.c>


More information about the nettle-bugs mailing list