On Fri, 23 Jan 2015, Niels Möller wrote:
nisse@lysator.liu.se (Niels Möller) writes:
I haven't done the memory barrier thing yet, it appears to be more complicated than I had hoped. The manual I have say that the dmb instruction (data memory barrier) is available only with armv7 and later. And that armv6 uses writes to CP15 registers (I haven't yet tried to figure what that means out, or if this method works also on later versions).
I think I've found a simple solution. I deleted the initialized flag in fat_init, instead I let each caller read the particular function pointer it is interested in, and check if it is already properly initialized or not. I.e., check if the current value equals its static initializer, and if so, call fat_init.
This way, store order consistency between threads no longer matters, and we won't need any memory barriers.
I'd like to merge this code on the master branch soon. It would be nice if anyone else could give it a little testing, in particular on various ARM devices. I've tested it on a few different x86_64 pc:s and an ARMv7 pandaboard, all running gnu/linux.
I tested it on a raspberry pi (ARMv6), and it seems to work pretty much as intended - I was able to do a fat build with neon, while executing the testsuite works (so the detection seems to work as intended).
I also tested building for ARMv5 using the android NDK, and I noted that arm/v6/aes*.asm require a ".arch armv6" at the start, otherwise they fail to assemble in that configuration. (The neon sources seem to have ".fpu neon" similarly already. I'm not sure if some of the neon source perhaps would require an ".arch armv7-a" as well, but they did seem to build just fine in my test so perhaps it isn't necessary.)
To test this for yourself in case you're interested, add <ndk>/toolchains/arm-linux-androideabi-4.6/prebuilt/*x86*/bin to your path, configure with this line: SYSROOT=<ndk>/platforms/android-3/arch-arm/ CC="arm-linux-androideabi-gcc --sysroot=$SYSROOT" CXX="arm-linux-androideabi-g++ --sysroot=$SYSROOT" ./configure --host=arm-linux-gnueabi --enable-fat
Other than that, building with --enable-fat does seem to do the right thing - much better than the current setup. (E.g. currently, if cross-compiling for raspberry pi, it fails to enable the v6 routines, since the host triplet is arm-bcm2708hardfp-linux-gnueabi even though it's a armv6 device. When building on such a device, config.guess gives armv6l-unknown-linux-gnueabihf instead.)
I take it you've tested building for windows? Although the x86 detection should be much simpler, so it's only the absence of ifunc that'd be tested there.
// Martin