On Tue, 13 Jan 2015, Niels Möller wrote:
Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
It's early, but it would be nice if the arm neon code was part of fat as well.
Sure, that's the next step, once I have a structure I think is workable. Does anyone have a pointer to how to check cpu capabilities on ARM?
Yes - and it's a bit hairy. (I've got a TL;DR version halfway down.)
There's no direct CPU instruction for it, contrary to x86. One way of detecting it via pure code, is trying to execute the tested instructions, and catching the SIGILL (or similar on other platforms) in case it isn't supported. (Touching signal handlers from within a library isn't necessary a nice thing to do, though.)
Short of trying to run the instructions, some OSes provide this info in another way - Linux is the main case here.
Before going into the Linux case, note that iOS doesn't have such a mechanism, but it isn't really needed there. On iOS, all armv7 configurations include support for NEON, so if you can assemble NEON instructions you don't need any detection. Since this platform uses fat binaries, you could have a separate armv6 slice of your binary (and that's the main way of doing it here - instead of enabling things at runtime within one binary, include separate slices for each intended configurations). The recent Xcode tools no longer support building for armv6 though, and App Store doesn't accept such submissions any longer.
Similarly for Windows Phone (and WinRT), the tools assume a platform with armv7 including NEON, so this doesn't require any detection. If you'd want to use more exotic instructions that aren't available in this baseline, you'd probably need to have detection via SIGILL/exception handlers.
On Linux, you can open /proc/self/auxv and parse this relatively easily, and check for HWCAP_NEON. This has got the drawback that recent Android kernels may block access to this file [1].
Instead of opening this file, you could use the getauxval function to get the same auxillary vector. Since this function isn't universally available, you'd also need to check whether you can use it at all (or load it using dlsym). In particular, it has only been available for a relative short time on Android, so you can't rely on it there.
The final fallback is parsing /proc/cpuinfo, which always should work. You can pretty easily find the Features line and look for the features. The line ends with a space, so you can use something as simple as strstr(line, " neon ") to parse it.
The gotcha about /proc/cpuinfo is that it is different for ARMv8 kernels - features like neon, which were optional on ARMv7, aren't optional any longer and thus are omitted. To handle this, you can either parse the "CPU architecture" field, and if this is >= 8, assume neon, or you can look for the "asimd" feature which is printed, which means the same.
To simplify running old 32 bit binaries, the Android ARMv8 kernels have an extra compatibility feature for this, readding the "neon" keyword there. [2] [3] This extra compatibility isn't available in upstream kernels though so it can't be relied on (it was proposed in [4] but not merged yet).
TL;DR - it's mostly only necessary on linux. The simplest solution which works everywhere is parsing /proc/cpuinfo.
[1] http://b.android.com/43055 [2] https://android.googlesource.com/kernel/common/+/cba0c6b2913c0d075a7434025f5... [3] https://android.googlesource.com/kernel/common/+/3868e7f8d47992922756d1aa659... [4] http://marc.info/?l=linux-arm-kernel&m=139087240101974
Example of /proc/cpuinfo from a pandaboard:
Processor : ARMv7 Processor rev 10 (v7l) processor : 0 BogoMIPS : 1392.74
processor : 1 BogoMIPS : 1363.33
Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc09 CPU revision : 10
Hardware : OMAP4 Panda board Revision : 0020 Serial : 0000000000000000
From a Nexus 9:
Processor : NVIDIA Denver 1.0 rev 0 (aarch64) processor : 0 processor : 1 Features : fp asimd aes pmull sha1 sha2 crc32 CPU implementer : 0x4e CPU architecture: AArch64 CPU variant : 0x0 CPU part : 0x000 CPU revision : 0
Hardware : Flounder Revision : 0000 Serial : 0000000000000000 MTS version : 33410787
From a Nexus 9, read from a 32 bit process:
Processor : NVIDIA Denver 1.0 rev 0 (aarch64) processor : 0 processor : 1 Features : fp asimd aes pmull sha1 sha2 crc32 wp half thumb fastmult vfp edsp neon vfpv3 tlsi vfpv4 idiva idivt CPU implementer : 0x4e CPU architecture: 8 CPU variant : 0x0 CPU part : 0x000 CPU revision : 0
Hardware : Flounder Revision : 0000 Serial : 0000000000000000 MTS version : 33410787
Finally, a few examples on all of this from other libraries:
libvpx, catching illegal instruction exceptions on windows platforms, and parsing /proc/cpuinfo: http://git.chromium.org/gitweb/?p=webm/libvpx.git;a=blob;f=vpx_ports/arm_cpu...
libav, trying /proc/self/auxv, falling back to /proc/cpuinfo: https://git.libav.org/?p=libav.git;a=blob;f=libavutil/arm/cpu.c;h=8bdaa884
OpenH264, with very minimal parsing of /proc/cpuinfo (and a bunch of other things): https://github.com/cisco/openh264/blob/34661f1d8/codec/common/src/cpu.cpp#L2...
The Android cpufeatures library (which tries /proc/self/auxv, tries loading getauxval, and falls back to /proc/cpuinfo): https://android.googlesource.com/platform/ndk/+/13a99c7f/sources/android/cpu...
x264, catching SIGILL: http://git.videolan.org/?p=x264.git;a=blob;f=common/cpu.c;h=cad5f2c2e9
// Martin