Martin Storsjö martin@martin.st writes:
Before the upcoming(?) release, it might be good to tweak the way the NEON optimizations are enabled. Currently, as far as I've seen, they're enabled as soon as you target ARMv7, even if not all ARMv7 CPUs have NEON.
The name "armv7" on the directory is maybe not entirely correct either. I just noticed that config.guess classifies my pandaboard "armv7l-unknown-linux-gnueabihf". gmp uses a script on top of the standard config.guess to give more fine-graned classification, and it says "armcortexa9neon-unknown-linux-gnueabihf". Seems this is based on parsing of /proc/cpuinfo.
I'm not going to do anything sophisticated about this before the 2.7 release (which I'd like to get out within a few days, at most two weeks). I could add an --enable-neon/--disable-neon flag, with default based either on /proc/cpuinfo (and some fix default for cross compilation), or on what the assembler accepts, as you suggest. Would that make sense?
To just detect neon, something like grep '^Features.*neon' /proc/cpuinfo seems simple enough. And then I guess we can follow gmp conventions and have some subdirectories arm, arm/v6, arm/neon, searched in configure-dependent order.
I've seen that you've discussed this on the GMP list a few times as well. For static detection in configure, one way is to check whether the assembler can do the neon instructions without actually adding the ".fpu neon" line.
Then one would essentially configure it by setting CC, to something like CC='gcc -mfpu=neon". Somehow logical, and analoguous to how ABI is configured, but not entirely user friendly.
Since there's AFAIK no runtime detection for anything else in nettle so far (nor any state or global variables), I guess you're not going to add it anytime soon
I'd definitely like to have run-time detection, but you're right that it's unlikely to happen soon. It could use atomic writes to some pointer variables, but there's no need for locking or any user-visible initialization functions.
I should also look into IFUNC relocations. Nettle can't rely exclusively on IFUNC, since it's not portable, but when available it makes it possible to eliminate one level of indirection, and install a pointer to the right routine directly in the PLT entry.
Anyway, for reference on the topic of runtime detection (which is necessary e.g. on Android ARMv7, which explicitly supports a number of ARMv7 devices without NEON), there's a number of ways of doing it.
I've recently got some android devices to play with, but I'm not yet very familiar with android at all. I haven't tried to compile anything for android yet, and I haven't looked at google's sdk. I'd prefer a standard cross-compilation setup, like what I get with
apt-get gcc-arm-linux-gnueabihf
for cross-compiling to a ARM gnu/linux system. Is that possible yet?
Regards, /Niels