 
            Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
A quick and dirty patch to enable SSE2 instructions for memxor() on Intel CPUs is attached. I tried to follow the logic in the fat.c file, but I may have missed something. I've not added memxor3() because it is actually slower with SSE2.
Cool!
SSE2: memxor aligned 26081.83 memxor unaligned 25893.69
No-SSE2: memxor aligned 17806.94 memxor unaligned 16581.48
How confident are you that the intel vs amd check is the right way to enable sse2? I guess we could add check on the particular cpu model later, if needed. Which model(s) did you benchmark on?
It would be nice in a way if we could share code with x86_64/memxor.asm. E.g., by defining x86_64/fat/memxor-1.asm and x86_64/fat/memxor-2.asm which each include the same file with a different setting of USE_SSE2.
But I haven't looked at that carefully, it might be better to have a unified x86_64/fat/memxor.asm with two entry points, like you do.
I've also been considering m4 hacks to let a single fat .asm file include multiple other .asm files, or including the same file twice, without labels or m4 definitions colliding, but I'm not sure that's worth the effort. The foo-1.asm, foo-2.asm, ... scheme is a bit inelegant, but it is easy to understand.
- _nettle_cpuid (0, cpuid_data);
- if (memcmp(&cpuid_data[1], "Genu", 4) == 0 &&
memcmp(&cpuid_data[3], "ineI", 4) == 0 &&
memcmp(&cpuid_data[2], "ntel", 4) == 0) {
This could also be written as a single memcmp call, or 3 comparisons of integers.
Regards, /Niels