On 12/13/2012 10:31 AM, Niels Möller wrote:
Maybe it would be better to copy data back and forth to 64-bit registers, but I seem to vaguely recall that moves between regular registers and xmm registers being slow, is that so?
I don't think you can ever know given the large number of architectures and designs. If you remember in the memxor using the SSE2 registers gave a 10x boost in intel processors was running at 0.9x the original speed in the AMD ones.
I'd say just try it on some stock processors and see what's best on average (I can test code on i7 if you need).
regards, Nikos