On Sun, Oct 11, 2020 at 2:03 PM Niels Möller nisse@lysator.liu.se wrote:
Jeffrey Walton noloader@gmail.com writes:
I may be mistaken, but I believe 64-bit poly multiplies are available. Or they are available on Aarch64 with Crypto extensions.
I'm looking in the Arm Instruction Set Reference Guide, labeled version 1.0, 2018.
It includes a section on cryptographic instructions, but that's aes, sha1 and sha256, no carry-less multiplication.
But I may well be missing something, I'm not really familiar with Aarch64.
I'm not aware of poly multiplies on other ARM arches, like ARMv6 or ARMv7 with NEON.
I think the "p8" SIMD datatype and vmull.p8 have been part of the Neon instruction set for a long time, at least since I wrote my first ARM code back in 2013. It's just a bit annoyning that one needs so many of them to do a wide multiply.
Oh, you're right. There is a vmull for NEON.
According to an early NEON programming guide from ARM (https://static.docs.arm.com/den0018/a/DEN0018A_neon_programmers_guide_en.pdf), the widest you can perform is P16 poly multiply.
Jeff