ARM/NEON optimizations

List overview All Threads
Download

newer

older

[PATCH 1/2] Consistently use...

Support for via quadcore SHA512 hw...

Martin Storsjö

17 Apr 2013 17 Apr '13

1:29 p.m.

Hi,

Before the upcoming(?) release, it might be good to tweak the way the NEON optimizations are enabled. Currently, as far as I've seen, they're enabled as soon as you target ARMv7, even if not all ARMv7 CPUs have NEON.

I've seen that you've discussed this on the GMP list a few times as well. For static detection in configure, one way is to check whether the assembler can do the neon instructions without actually adding the ".fpu neon" line. If this succeeds, the compiler/assembler is configured with -mfpu=neon, and we can freely use neon instructions wherever.

Since there's AFAIK no runtime detection for anything else in nettle so far (nor any state or global variables), I guess you're not going to add it anytime soon - for that case, detecting it statically in configure based on the target configuration is probably good enough.

Anyway, for reference on the topic of runtime detection (which is necessary e.g. on Android ARMv7, which explicitly supports a number of ARMv7 devices without NEON), there's a number of ways of doing it. One way that was mentioned on the gmp mailing list was using getauxval from glibc. This obviously isn't available on non-glibc platforms such as Android (or other platforms as well).

On general linux, one pretty straightforward way is to parse /proc/self/auxv (which is pretty well structured), the other is parsing /proc/cpuinfo (which requires a little bit more code to parse).

The Android NDK comes with a small support library that detects cpu extensions - previously it used to parse /proc/self/auxv, but in recent Android versions this file isn't accessible within release mode processes, so now it parses /proc/cpuinfo. See http://git.libav.org/?p=libav.git;a=blob;f=libavutil/arm/cpu.c or https://android.googlesource.com/platform/ndk/+/a92ba07e8abf397e74285e90d080... for examples on how to parse these files.

A third option is to allow the caller to set it, so the caller can use whatever runtime detection that is available on the app level and configure the library accordingly. But this both goes against the design of nettle, and additionally isn't too useful when nettle in many cases is used indirectly via a number of other libraries, and the calling app might not even know much about what lower level libraries are used.

// Martin

Show replies by date

nisse＠lysator.liu.se

17 Apr 17 Apr

2:57 p.m.

Martin Storsjö martin@martin.st writes:

...

Before the upcoming(?) release, it might be good to tweak the way the NEON optimizations are enabled. Currently, as far as I've seen, they're enabled as soon as you target ARMv7, even if not all ARMv7 CPUs have NEON.

The name "armv7" on the directory is maybe not entirely correct either. I just noticed that config.guess classifies my pandaboard "armv7l-unknown-linux-gnueabihf". gmp uses a script on top of the standard config.guess to give more fine-graned classification, and it says "armcortexa9neon-unknown-linux-gnueabihf". Seems this is based on parsing of /proc/cpuinfo.

I'm not going to do anything sophisticated about this before the 2.7 release (which I'd like to get out within a few days, at most two weeks). I could add an --enable-neon/--disable-neon flag, with default based either on /proc/cpuinfo (and some fix default for cross compilation), or on what the assembler accepts, as you suggest. Would that make sense?

To just detect neon, something like grep '^Features.*neon' /proc/cpuinfo seems simple enough. And then I guess we can follow gmp conventions and have some subdirectories arm, arm/v6, arm/neon, searched in configure-dependent order.

...

I've seen that you've discussed this on the GMP list a few times as well. For static detection in configure, one way is to check whether the assembler can do the neon instructions without actually adding the ".fpu neon" line.

Then one would essentially configure it by setting CC, to something like CC='gcc -mfpu=neon". Somehow logical, and analoguous to how ABI is configured, but not entirely user friendly.

...

Since there's AFAIK no runtime detection for anything else in nettle so far (nor any state or global variables), I guess you're not going to add it anytime soon

I'd definitely like to have run-time detection, but you're right that it's unlikely to happen soon. It could use atomic writes to some pointer variables, but there's no need for locking or any user-visible initialization functions.

I should also look into IFUNC relocations. Nettle can't rely exclusively on IFUNC, since it's not portable, but when available it makes it possible to eliminate one level of indirection, and install a pointer to the right routine directly in the PLT entry.

...

Anyway, for reference on the topic of runtime detection (which is necessary e.g. on Android ARMv7, which explicitly supports a number of ARMv7 devices without NEON), there's a number of ways of doing it.

I've recently got some android devices to play with, but I'm not yet very familiar with android at all. I haven't tried to compile anything for android yet, and I haven't looked at google's sdk. I'd prefer a standard cross-compilation setup, like what I get with

apt-get gcc-arm-linux-gnueabihf

for cross-compiling to a ARM gnu/linux system. Is that possible yet?

Regards, /Niels

-- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance.

Martin Storsjö

7:24 p.m.

On Wed, 17 Apr 2013, Niels Möller wrote:

...

Martin Storsjö martin@martin.st writes:

...
Before the upcoming(?) release, it might be good to tweak the way the NEON optimizations are enabled. Currently, as far as I've seen, they're enabled as soon as you target ARMv7, even if not all ARMv7 CPUs have NEON.

The name "armv7" on the directory is maybe not entirely correct either. I just noticed that config.guess classifies my pandaboard "armv7l-unknown-linux-gnueabihf". gmp uses a script on top of the standard config.guess to give more fine-graned classification, and it says "armcortexa9neon-unknown-linux-gnueabihf". Seems this is based on parsing of /proc/cpuinfo.

I'm not going to do anything sophisticated about this before the 2.7 release (which I'd like to get out within a few days, at most two weeks). I could add an --enable-neon/--disable-neon flag, with default based either on /proc/cpuinfo (and some fix default for cross compilation), or on what the assembler accepts, as you suggest. Would that make sense?

That's certainly a good start. The test for what the assembler accepts at least should be the safest - then no neon instructions are produced unless this would be allowed by the C compiler in general. I'm not sure how well detecting from /proc/cpuinfo would work for something e.g. like debian (unless the packager overrides it with arm specific options), where the package might be built on a pandaboard but is intended for distribution on a wider range of devices. But as long as it can be disabled with --disable-neon, packagers/users can always get it working right with more or less effort. OTOH, if the /proc/cpuinfo approach works for GMP, doing something similar probably makes sense as well.

...

To just detect neon, something like grep '^Features.*neon' /proc/cpuinfo seems simple enough. And then I guess we can follow gmp conventions and have some subdirectories arm, arm/v6, arm/neon, searched in configure-dependent order.

Yes, that regexp probably is enough, and that directory structure seems sensible.

...

...
I've seen that you've discussed this on the GMP list a few times as well. For static detection in configure, one way is to check whether the assembler can do the neon instructions without actually adding the ".fpu neon" line.

Then one would essentially configure it by setting CC, to something like CC='gcc -mfpu=neon". Somehow logical, and analoguous to how ABI is configured, but not entirely user friendly.

Not entirely user friendly, no, but I've seen the same pattern elsewhere. And since you don't want to produce neon instructions (at least not guarded by proper runtime detection) unless the baseline ABI supports it, it's a pretty decent safeguard as well.

...

...
Since there's AFAIK no runtime detection for anything else in nettle so far (nor any state or global variables), I guess you're not going to add it anytime soon

I'd definitely like to have run-time detection, but you're right that it's unlikely to happen soon. It could use atomic writes to some pointer variables, but there's no need for locking or any user-visible initialization functions.

I should also look into IFUNC relocations. Nettle can't rely exclusively on IFUNC, since it's not portable, but when available it makes it possible to eliminate one level of indirection, and install a pointer to the right routine directly in the PLT entry.

...
Anyway, for reference on the topic of runtime detection (which is necessary e.g. on Android ARMv7, which explicitly supports a number of ARMv7 devices without NEON), there's a number of ways of doing it.

I've recently got some android devices to play with, but I'm not yet very familiar with android at all. I haven't tried to compile anything for android yet, and I haven't looked at google's sdk. I'd prefer a standard cross-compilation setup, like what I get with

apt-get gcc-arm-linux-gnueabihf

for cross-compiling to a ARM gnu/linux system. Is that possible yet?

Yes, that's quite possible. If you unpack an android NDK, you've got a directory like android-ndk-r8e/toolchains/arm-linux-androideabi-4.7/prebuilt/darwin-x86_64/bin, where you find the normal GCC cross toolchain, with tools like arm-linux-androideabi-gcc. When using this directly, you need to add a parameter like --sysroot=android-ndk/platforms/android-3/arch-arm to the compiler, to find the right platform headers. Alternatively, you can run a script in ndk/build/tools/make-standalone-toolchain.sh, which copies out a toolchain and set of platform headers/libs and bundles them together so you don't need the sysroot parameter. All of this is documented within ndk/docs/STANDALONE-TOOLCHAIN.html.

For building static libraries, this works pretty much exactly as you'd do on any other platform with a normal gcc cross toolchain. If you've got a rooted device, you can build normal executables as well and run them on the device ("adb push mybinary /data; adb shell /data/mybinary"). If you haven't got a rooted device, you'd want to set up a normal android app project (either a "pure native" project, or a full java project which calls native code via JNI - in either case which then would link to the static library you've built externally).

For a shared library, there's a few small extra gotchas. The built shared object would need to be named libnettle.so without trailing version numbers. Additionally, due to deficiencies in the android app environment when it comes to shared library loading, you'd need to load them in reverse order from java; first a System.loadLibrary("nettle"), then System.loadLibrary("myapp"), so that all dependencies of the shared library libmyapp.so is loaded before trying to load it. For a number of layered libraries, this is a bit of an issue, but it's mostly the burden of the app developer who wants to release it and wants to keep the libraries linked separately for license reasons (combining LGPL code with proprietary code), otherwise linking it statically obviously is simpler.

// Martin

nisse＠lysator.liu.se

18 Apr 18 Apr

2:23 p.m.

Martin Storsjö martin@martin.st writes:

...

On Wed, 17 Apr 2013, Niels Möller wrote:

...
I'm not going to do anything sophisticated about this before the 2.7 release (which I'd like to get out within a few days, at most two weeks). I could add an --enable-neon/--disable-neon flag, with default based either on /proc/cpuinfo (and some fix default for cross compilation), or on what the assembler accepts, as you suggest. Would that make sense?

That's certainly a good start.

Something along those lines checked in now. Testing appreciated.

For native compilation, configure checks /proc/cpuinfo, and for cross compilation, it checks what the assembler accepts.

Regards, /Niels

-- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance.

Martin Storsjö

3:34 p.m.

On Thu, 18 Apr 2013, Niels Möller wrote:

...

Martin Storsjö martin@martin.st writes:

...
On Wed, 17 Apr 2013, Niels Möller wrote:

...
I'm not going to do anything sophisticated about this before the 2.7 release (which I'd like to get out within a few days, at most two weeks). I could add an --enable-neon/--disable-neon flag, with default based either on /proc/cpuinfo (and some fix default for cross compilation), or on what the assembler accepts, as you suggest. Would that make sense?

That's certainly a good start.

Something along those lines checked in now. Testing appreciated.

For native compilation, configure checks /proc/cpuinfo, and for cross compilation, it checks what the assembler accepts.

This seems to be work as intended (after some very brief testing), thanks!

// Martin

Nikos Mavrogiannopoulos

21 Apr 21 Apr

12:19 p.m.

On Thu, Apr 18, 2013 at 4:23 PM, Niels Möller nisse@lysator.liu.se wrote:

...

Something along those lines checked in now. Testing appreciated. For native compilation, configure checks /proc/cpuinfo, and for cross compilation, it checks what the assembler accepts.

On which algorithms are the neon instructions used? I don't know if that makes much sense performance-wise, but if they could be auto-detected maybe I could include the neon code unconditionally on the arm architecture and enable them in gnutls if the instructions are there, the same way we use for padlock and aes-ni.

regards, Nikos

nisse＠lysator.liu.se

3:46 p.m.

Nikos Mavrogiannopoulos nmav@gnutls.org writes:

...

On which algorithms are the neon instructions used?

sha512, sha3, salsa20 and umac.

They make a big difference for sha512 and sha3, since they depend a lot on 64-bit operations. For umac, it would make some sense to also write some non-neon ARM assembly, using the umaal instruction as the main work horse.

Regards, /Niels

-- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance.

4507

Age (days ago)

4511

Last active (days ago)

nettle-bugs@lists.lysator.liu.se

6 comments

3 participants

tags (0)

participants (3)

Martin Storsjö
Nikos Mavrogiannopoulos
nisse＠lysator.liu.se