Re: ARM/NEON optimizations

17 Apr 2013


      On Wed, 17 Apr 2013, Niels Möller wrote:
...
Martin Storsjö martin@martin.st writes:
...
Before the upcoming(?) release, it might be good to tweak the way the
NEON optimizations are enabled. Currently, as far as I've seen,
they're enabled as soon as you target ARMv7, even if not all ARMv7
CPUs have NEON.
The name "armv7" on the directory is maybe not entirely correct either.
I just noticed that config.guess classifies my pandaboard
"armv7l-unknown-linux-gnueabihf". gmp uses a script on top of the
standard config.guess to give more fine-graned classification, and it
says "armcortexa9neon-unknown-linux-gnueabihf". Seems this is based on
parsing of /proc/cpuinfo.
I'm not going to do anything sophisticated about this before the 2.7
release (which I'd like to get out within a few days, at most two
weeks). I could add an --enable-neon/--disable-neon flag, with default
based either on /proc/cpuinfo (and some fix default for cross
compilation), or on what the assembler accepts, as you suggest. Would
that make sense?
That's certainly a good start. The test for what the assembler accepts at 
least should be the safest - then no neon instructions are produced unless 
this would be allowed by the C compiler in general. I'm not sure how well 
detecting from /proc/cpuinfo would work for something e.g. like debian 
(unless the packager overrides it with arm specific options), where the 
package might be built on a pandaboard but is intended for distribution on 
a wider range of devices. But as long as it can be disabled with 
--disable-neon, packagers/users can always get it working right with more 
or less effort. OTOH, if the /proc/cpuinfo approach works for GMP, doing 
something similar probably makes sense as well.
...
To just detect neon, something like grep '^Features.*neon' /proc/cpuinfo
seems simple enough. And then I guess we can follow gmp conventions and
have some subdirectories arm, arm/v6, arm/neon, searched in
configure-dependent order.
Yes, that regexp probably is enough, and that directory structure seems 
sensible.
...
...
I've seen that you've discussed this on the GMP list a few times as
well. For static detection in configure, one way is to check whether
the assembler can do the neon instructions without actually adding the
".fpu neon" line.
Then one would essentially configure it by setting CC, to something like
CC='gcc -mfpu=neon". Somehow logical, and analoguous to how ABI is
configured, but not entirely user friendly.
Not entirely user friendly, no, but I've seen the same pattern elsewhere. 
And since you don't want to produce neon instructions (at least not 
guarded by proper runtime detection) unless the baseline ABI supports it, 
it's a pretty decent safeguard as well.
...
...
Since there's AFAIK no runtime detection for anything else in nettle
so far (nor any state or global variables), I guess you're not going
to add it anytime soon
I'd definitely like to have run-time detection, but you're right that
it's unlikely to happen soon. It could use atomic writes to some pointer
variables, but there's no need for locking or any user-visible
initialization functions.
I should also look into IFUNC relocations. Nettle can't rely exclusively
on IFUNC, since it's not portable, but when available it makes it
possible to eliminate one level of indirection, and install a pointer to
the right routine directly in the PLT entry.
...
Anyway, for reference on the topic of runtime detection (which is
necessary e.g. on Android ARMv7, which explicitly supports a number of
ARMv7 devices without NEON), there's a number of ways of doing it.
I've recently got some android devices to play with, but I'm not yet
very familiar with android at all. I haven't tried to compile anything
for android yet, and I haven't looked at google's sdk. I'd prefer a
standard cross-compilation setup, like what I get with
apt-get gcc-arm-linux-gnueabihf
for cross-compiling to a ARM  gnu/linux system. Is that possible yet?
Yes, that's quite possible. If you unpack an android NDK, you've got a 
directory like 
android-ndk-r8e/toolchains/arm-linux-androideabi-4.7/prebuilt/darwin-x86_64/bin, 
where you find the normal GCC cross toolchain, with tools like 
arm-linux-androideabi-gcc. When using this directly, you need to add a 
parameter like --sysroot=android-ndk/platforms/android-3/arch-arm to the 
compiler, to find the right platform headers. Alternatively, you can run a 
script in ndk/build/tools/make-standalone-toolchain.sh, which copies out a 
toolchain and set of platform headers/libs and bundles them together so 
you don't need the sysroot parameter. All of this is documented within 
ndk/docs/STANDALONE-TOOLCHAIN.html.
For building static libraries, this works pretty much exactly as you'd do 
on any other platform with a normal gcc cross toolchain. If you've got a 
rooted device, you can build normal executables as well and run them on 
the device ("adb push mybinary /data; adb shell /data/mybinary"). If you 
haven't got a rooted device, you'd want to set up a normal android app 
project (either a "pure native" project, or a full java project which 
calls native code via JNI - in either case which then would link to the 
static library you've built externally).
For a shared library, there's a few small extra gotchas. The built shared 
object would need to be named libnettle.so without trailing version 
numbers. Additionally, due to deficiencies in the android app environment 
when it comes to shared library loading, you'd need to load them in 
reverse order from java; first a System.loadLibrary("nettle"), then 
System.loadLibrary("myapp"), so that all dependencies of the shared 
library libmyapp.so is loaded before trying to load it. For a number of 
layered libraries, this is a bit of an issue, but it's mostly the burden 
of the app developer who wants to release it and wants to keep the 
libraries linked separately for license reasons (combining LGPL code with 
proprietary code), otherwise linking it statically obviously is simpler.
// Martin

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: ARM/NEON optimizations