memxor

12 Sep 2011


      Hello,
 I've run some tests with memxor on a x86-64 machine. My results are:
* C implementation (compiled with gcc 4.4):
        Xoring in chunks of 32768 bytes: done. 50.09 Gb in 5.00 secs:
10.02 Gb/sec
        Xoring (unaligned) in chunks of 32768 bytes: done. 39.90 Gb in
5.00 secs: 7.98 Gb/sec
* ASM implementation:
        Xoring in chunks of 32768 bytes: done. 38.32 Gb in 5.00 secs:
7.66 Gb/sec
        Xoring (unaligned) in chunks of 32768 bytes: done. 30.16 Gb in
5.00 secs: 6.03 Gb/sec
It seems that in x86-64 the ASM version is slower than the C one.
Moreover I noticed that the loop unrolling techniques used in the C
code have no visible performance benefit.
However, an SSE2 version of memxor (attached) increases performance by
30% or more in the same CPU.
* SSE2:
        Xoring in chunks of 32768 bytes: done. 69.94 Gb in 5.00 secs:
13.98 Gb/sec
        Xoring (unaligned) in chunks of 32768 bytes: done. 65.96 Gb in
5.00 secs: 13.19 Gb/sec
regards,
Nikos

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

memxor