Re: Fat library support

16 Jan 2015


      On Fri, 2015-01-16 at 22:18 +0100, Niels Möller wrote:
...
Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
...
A quick and dirty patch to enable SSE2 instructions for memxor() on
Intel CPUs is attached.
I tried to follow the logic in the fat.c file, but I may have missed
something. I've not added memxor3() because it is actually slower with
SSE2.
Cool!
...
SSE2:
            memxor     aligned 26081.83
            memxor   unaligned 25893.69
No-SSE2:
            memxor     aligned 17806.94
            memxor   unaligned 16581.48
How confident are you that the intel vs amd check is the right way
to enable sse2? I guess we could add check on the particular cpu model
later, if needed. Which model(s) did you benchmark on?
The benchmarks (if it is same as the older code I've sent you few years
ago), have been done on intel i7, i5 and a xeon. In all of them there
was an improvement. The benchmark above is on i7.
About that not improving on AMD I have no more data than what I've wrote
you last time (which was few years ago). No idea if newer AMD processors
behave better.
...
It would be nice in a way if we could share code with x86_64/memxor.asm.
E.g., by defining x86_64/fat/memxor-1.asm and x86_64/fat/memxor-2.asm
which each include the same file with a different setting of USE_SSE2.
But I haven't looked at that carefully, it might be better to have a
unified x86_64/fat/memxor.asm with two entry points, like you do.
I've also been considering m4 hacks to let a single fat .asm file
include multiple other .asm files, or including the same file twice,
without labels or m4 definitions colliding, but I'm not sure that's
worth the effort. The foo-1.asm, foo-2.asm, ... scheme is a bit
inelegant, but it is easy to understand.
I didn't like the duplication of code either. I'm not very skilled in
m4, but I though that x86_64/ could include the fat variant and use the
non-sse2 variant.
The code in fat.c is quite elaborate on the cases it handles. The more
functions added the more unmanageable the code will become. Would it
make sense to restrict that support to the systems where ifunc is
available? Then the addition of new optimized functions becomes very
simple.
regards,
Nikos

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: Fat library support