Re: [PATCH 2/5] Do the movd/movq workaround for the osx assembler, for sha3-permute

25 Mar 2013


      Martin Storsjö martin@martin.st writes:
...
--- a/x86_64/sha3-permute.asm
+++ b/x86_64/sha3-permute.asm
BTW, this really file needs a rewrite. It runs much slower than the C
version on some (or all?) AMD processors. Probably because the movq/movd
between general registers and xmm registers have a large latency
penalty. One would either need to move data via memory (maybe with a
separate permute/rotate passworking with general registers and memory),
or squeeze (almost) all state into the xmm registers, a bit like the arm
neon sha3 code I wrote the other week.
Regards,
/Niels
-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [PATCH 2/5] Do the movd/movq workaround for the osx assembler, for sha3-permute