Re: Getting closer to nettle-2.6

2 Jan 2013


      ...
...
...
...
...
"NM" == Niels Möller nisse@lysator.liu.se writes:
NM>    my best guess is that it's the
NM>    moves of data between regular registers and xmm registers that
NM>    somehow stall.
IIRC, the advice I've seen is to always move data between the integer
registers and the xmm registers via the stack.
All of the relevant gcc- and llvm-produced code I've looked (at least
over the last few months; I can't remember too far back) follows that
pattern.
Yes, The 47414_15h_sw_opt_guide.pdf, in §10.4 says:
,----< §10.4, p169 of 47414_15h_sw_opt_guide.pdf¹ >
| Optimization
| 
| When moving data from a GPR to an XMM register, use separate store and
| load instructions to move the data first from the source register to a
| temporary location in memory and then from memory into the destination
| register, taking the memory latency into account when scheduling both
| stages of the load-store sequence.
| 
| When moving data from an XMM register to a general-purpose register,
| use the VMOVD instruction.
| 
| Whenever possible, use loads and stores of the same data length. (See
| 6.3, ‘Store-to-Load Forwarding Restrictions” on page 98 for more
| information.)
`----
VMOVD, obviosuly, doesn’t apply for fam10 and earlier; I didn’t look
through my archive to find the sw_opt_guide for earlier processors, though.
1] http://support.amd.com/us/Processor_TechDocs/47414_15h_sw_opt_guide.pdf
-JimC
-- 
James Cloos cloos@jhcloos.com         OpenPGP: 1024D/ED7DAEA6

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: Getting closer to nettle-2.6