Hello Mamone,
On Sun, Jan 24, 2021 at 06:44:33PM +0200, Maamoun TK wrote:
representation. As for arm and aarch64, little-endian is the default, do you think, the routine could be changed to move the special endianness treatment using rev64 to BE mode, i.e. avoid them in the standard LE case? It's certainly beyond me but it might give some additional speedup.
Or would it be irrelevant compared to the speedup already given by using pmull in the first place?
I don't know how it gonna affect the performance but it's irrelevant margin indeed, TBH I liked the patch with the special endianness treatment but it's up to you to decide!
As you might expect, I like the one where doubleword vectors are used throughout and stored in host endianness in TABLE because to me it's most intuitive. For DATA my rationale is that if we want to *treat* it as big-endian doublewords we should load it as doublewords to make it clearer why and what we need to adjust afterwards. It also avoids the rev64s with BE. I've added some comments with rationale. I've added a README with an excerpt of last email as well. Attached are the current patches, the first being your original. What do you think?
As said, I'm up for looking into endianness-specific versions of the macros again. But what was supposed to be the LE versions of PMUL and friends has now become the BE-native versions and we'd need to come up with variants of them that make the rev64s unneccessary. Any ideas?