Nettle

6 Feb 2004

      Now I'm confused. I do have the 45 and 66 MB/s figures in my *shell*
buffer, but I can't reproduce the 66 MB/s figure. Perhaps that was
with the buggy version of the code? Anyway, x86 performance for the C
version doesn't matter that much anymore.
/ Niels Möller (vässar rödpennan)
Previous text:
...
2004-02-05 14:21:
Subject: Nettle

On my laptop (intel P4), I get an increase from 45 MB/s to 66MB/s.
Does it matter if the si, sj are ints or uint8_t? I get no speed
difference.
The inner loop gets compiled into (intel, gcc-3.3, -O2)
.L28:
   incb	-13(%ebp)
   decl	%ebx
   movzbl	-13(%ebp), %edx
   movzbl	(%edx,%edi), %ecx
   addb	%cl, -14(%ebp)
   movzbl	-14(%ebp), %eax
   movzbl	(%eax,%edi), %eax
   movb	%al, (%edx,%edi)
   addb	%cl, %al
   movl	16(%ebp), %edx
   movzbl	%al, %eax
   movzbl	(%eax,%edi), %eax
   xorb	(%esi), %al
   incl	%esi
   movb	%al, (%edx)
   incl	%edx
   cmpl	$-1, %ebx
   movl	%edx, 16(%ebp)
   jne	.L28
It seems it can't fit all variables into registers, hence the save and
restore operations via %ebp.
I wonder if my intel books will ever arrive.
/ Niels Möller (vässar rödpennan)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Nettle