On Wed, 17 Apr 2013, Niels Möller wrote:
Martin Storsjö martin@martin.st writes:
Before the upcoming(?) release, it might be good to tweak the way the NEON optimizations are enabled. Currently, as far as I've seen, they're enabled as soon as you target ARMv7, even if not all ARMv7 CPUs have NEON.
The name "armv7" on the directory is maybe not entirely correct either. I just noticed that config.guess classifies my pandaboard "armv7l-unknown-linux-gnueabihf". gmp uses a script on top of the standard config.guess to give more fine-graned classification, and it says "armcortexa9neon-unknown-linux-gnueabihf". Seems this is based on parsing of /proc/cpuinfo.
I'm not going to do anything sophisticated about this before the 2.7 release (which I'd like to get out within a few days, at most two weeks). I could add an --enable-neon/--disable-neon flag, with default based either on /proc/cpuinfo (and some fix default for cross compilation), or on what the assembler accepts, as you suggest. Would that make sense?
That's certainly a good start. The test for what the assembler accepts at least should be the safest - then no neon instructions are produced unless this would be allowed by the C compiler in general. I'm not sure how well detecting from /proc/cpuinfo would work for something e.g. like debian (unless the packager overrides it with arm specific options), where the package might be built on a pandaboard but is intended for distribution on a wider range of devices. But as long as it can be disabled with --disable-neon, packagers/users can always get it working right with more or less effort. OTOH, if the /proc/cpuinfo approach works for GMP, doing something similar probably makes sense as well.
To just detect neon, something like grep '^Features.*neon' /proc/cpuinfo seems simple enough. And then I guess we can follow gmp conventions and have some subdirectories arm, arm/v6, arm/neon, searched in configure-dependent order.
Yes, that regexp probably is enough, and that directory structure seems sensible.
I've seen that you've discussed this on the GMP list a few times as well. For static detection in configure, one way is to check whether the assembler can do the neon instructions without actually adding the ".fpu neon" line.
Then one would essentially configure it by setting CC, to something like CC='gcc -mfpu=neon". Somehow logical, and analoguous to how ABI is configured, but not entirely user friendly.
Not entirely user friendly, no, but I've seen the same pattern elsewhere. And since you don't want to produce neon instructions (at least not guarded by proper runtime detection) unless the baseline ABI supports it, it's a pretty decent safeguard as well.
Since there's AFAIK no runtime detection for anything else in nettle so far (nor any state or global variables), I guess you're not going to add it anytime soon
I'd definitely like to have run-time detection, but you're right that it's unlikely to happen soon. It could use atomic writes to some pointer variables, but there's no need for locking or any user-visible initialization functions.
I should also look into IFUNC relocations. Nettle can't rely exclusively on IFUNC, since it's not portable, but when available it makes it possible to eliminate one level of indirection, and install a pointer to the right routine directly in the PLT entry.
Anyway, for reference on the topic of runtime detection (which is necessary e.g. on Android ARMv7, which explicitly supports a number of ARMv7 devices without NEON), there's a number of ways of doing it.
I've recently got some android devices to play with, but I'm not yet very familiar with android at all. I haven't tried to compile anything for android yet, and I haven't looked at google's sdk. I'd prefer a standard cross-compilation setup, like what I get with
apt-get gcc-arm-linux-gnueabihf
for cross-compiling to a ARM gnu/linux system. Is that possible yet?
Yes, that's quite possible. If you unpack an android NDK, you've got a directory like android-ndk-r8e/toolchains/arm-linux-androideabi-4.7/prebuilt/darwin-x86_64/bin, where you find the normal GCC cross toolchain, with tools like arm-linux-androideabi-gcc. When using this directly, you need to add a parameter like --sysroot=android-ndk/platforms/android-3/arch-arm to the compiler, to find the right platform headers. Alternatively, you can run a script in ndk/build/tools/make-standalone-toolchain.sh, which copies out a toolchain and set of platform headers/libs and bundles them together so you don't need the sysroot parameter. All of this is documented within ndk/docs/STANDALONE-TOOLCHAIN.html.
For building static libraries, this works pretty much exactly as you'd do on any other platform with a normal gcc cross toolchain. If you've got a rooted device, you can build normal executables as well and run them on the device ("adb push mybinary /data; adb shell /data/mybinary"). If you haven't got a rooted device, you'd want to set up a normal android app project (either a "pure native" project, or a full java project which calls native code via JNI - in either case which then would link to the static library you've built externally).
For a shared library, there's a few small extra gotchas. The built shared object would need to be named libnettle.so without trailing version numbers. Additionally, due to deficiencies in the android app environment when it comes to shared library loading, you'd need to load them in reverse order from java; first a System.loadLibrary("nettle"), then System.loadLibrary("myapp"), so that all dependencies of the shared library libmyapp.so is loaded before trying to load it. For a number of layered libraries, this is a bit of an issue, but it's mostly the burden of the app developer who wants to release it and wants to keep the libraries linked separately for license reasons (combining LGPL code with proprietary code), otherwise linking it statically obviously is simpler.
// Martin