Hello, As it is now the ARM-neon and v6 optimizations are enabled when nettle is compiled on the exact system it is supposed to run. This is very unlikely to happen in some scenarios, as applications intended to run on an arm mobile (eg [0]) cannot know in advance the target. That is a petty because nettle is pretty unique in having support for assembly in ARMv6 or neon.
A solution to that issue would be to have a nettle library constructor that runs the equivalent of cpuid in ARM, and stores it to a global variable. Then each assembly module (e.g., aes-arm) will jump to the correct implementation detected at runtime.
This requires no global initialization functions or anything like that. In the systems where library constructors are not available (i.e., static linking), the cost would be running the oldest arm assembly version (unless the constructor is called explicitly).
This is similar to what openssl is doing already (with the exception that they use a global initialization function instead of a constructor). To me that seems quite a simple way to improve the current situation.
regards, Nikos
Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
A solution to that issue would be to have a nettle library constructor that runs the equivalent of cpuid in ARM, and stores it to a global variable. Then each assembly module (e.g., aes-arm) will jump to the correct implementation detected at runtime.
The difficult part is the configure work. We'd either have to build multiple object files for each function, with different link names, and then have some glue to select the right one at runtime.
Or use a "master file" for each function, say arm/fat/foo.asm, which includes the other files and makes the right thing happen.
Things get a bit more complex if we need to use the C version on some machines, since the current build setup assumes that an assembly file completely replaces the corresponding C file.
There's also IFUNC relocations, but I'm not sure which systems beyond vanilla gnu/linux support them. Are they usable on android, e.g.?
I clearly see the need for a runtime test for neon. Say, --enable-arm-neon=fat or a more general --enable-fat.
But you also mention v6 optimizations, for clarity, do you mean that you'd like to see runtime tests for that as too? To me, it seems a bit unlikely to need a fat binary which supports both pre-v6 arm, and v6 and later. I'd expect pre-v6 arm to be used only in embedded systems where the cpu flavor is known at build time.
Regards, /Niels
On Tue, Dec 17, 2013 at 4:12 PM, Niels Möller nisse@lysator.liu.se wrote:
A solution to that issue would be to have a nettle library constructor that runs the equivalent of cpuid in ARM, and stores it to a global variable. Then each assembly module (e.g., aes-arm) will jump to the correct implementation detected at runtime.
The difficult part is the configure work. We'd either have to build multiple object files for each function, with different link names, and then have some glue to select the right one at runtime.
Why not a big assembly function that contains everything? In the start it simply checks which CPU optimization is available and jumps to the appropriate label (i'm thinking x86 asm here but I hope what I say applies to arm as well).
Or use a "master file" for each function, say arm/fat/foo.asm, which includes the other files and makes the right thing happen.
That could work too.
Things get a bit more complex if we need to use the C version on some machines, since the current build setup assumes that an assembly file completely replaces the corresponding C file.
If everything were in a single file it would work like charm, but even splitting them to multiple files would work if subdirectories are used, and only the main file is considered the "real" asm. (I suppose you are referring to --disable-assember?)
I clearly see the need for a runtime test for neon. Say, --enable-arm-neon=fat or a more general --enable-fat.
I like the name :) I think the latter makes more sense if it is to be used for x86 as well.
But you also mention v6 optimizations, for clarity, do you mean that you'd like to see runtime tests for that as too? To me, it seems a bit unlikely to need a fat binary which supports both pre-v6 arm, and v6 and later. I'd expect pre-v6 arm to be used only in embedded systems where the cpu flavor is known at build time.
You may be right; it may make sense to treat them separately.
regards, Nikos
Nikos Mavrogiannopoulos n.mavrogiannopoulos@gmail.com writes:
Why not a big assembly function that contains everything?
Two reasons: 1. To make the fat binary thing optional, just using a simple cpu-specific file when the cpu is known at compile time. 2. In case we'd like to fallback to the C implementation for some function.
Or use a "master file" for each function, say arm/fat/foo.asm, which includes the other files and makes the right thing happen.
That could work too.
I'm leaning towards this, at least for a start. We'll see when I get time to play with this.
I clearly see the need for a runtime test for neon. Say, --enable-arm-neon=fat or a more general --enable-fat.
I like the name :) I think the latter makes more sense if it is to be used for x86 as well.
--enable-fat is what gmp uses. And "fat binaries/libraries" is almost standard terminology.
Regards, /Niels
nisse@lysator.liu.se (Niels Möller) writes:
Two reasons: 1. To make the fat binary thing optional, just using a simple cpu-specific file when the cpu is known at compile time. 2. In case we'd like to fallback to the C implementation for some function.
I checked what functions are involved. The ARM neon assembly is for salsa20, sha3, sha512, and umac. There's no non-neon assembly code, so the fat mechanism needs to choose between the C implementation and the neon implementation.
Regards, /Niels
On Tue, 17 Dec 2013, Niels Möller wrote:
There's also IFUNC relocations, but I'm not sure which systems beyond vanilla gnu/linux support them. Are they usable on android, e.g.?
Not sure - and even if they are they might not have been supported from the beginning, so it might only be usable from some particular android version.
Can you provide some small example that I could try on a range of versions?
I'd expect pre-v6 arm to be used only in embedded systems where the cpu flavor is known at build time.
ARMv5 is the baseline for the Android ARM ABI, which might be the main reason why anybody would care. In practice I'm not sure if any such devices actually have shipped - at least the first devices actually were ARMv6.
// Martin
Martin Storsjö martin@martin.st writes:
On Tue, 17 Dec 2013, Niels Möller wrote:
There's also IFUNC relocations, but I'm not sure which systems beyond vanilla gnu/linux support them. Are they usable on android, e.g.?
Not sure - and even if they are they might not have been supported from the beginning, so it might only be usable from some particular android version.
Can you provide some small example that I could try on a range of versions?
No, sorry. I'm not very familier with IFUNC and have never used it myself. I just know it's a feature in glibc and ld.so. The basic idea, as far as I understand, is that you register a custom resolve function to a symbol. When ld.so tries to resolve that symbol, the registered function is called, and it returns the address of the real function, which ld.so then installs it the proper place (in the PLT array, I guess).
Using it avoids one level of indirection, compared to using a wrapper function in Nettle which jumps through a function pointer.
I also don't know if it works at all with static libraries.
I'd expect pre-v6 arm to be used only in embedded systems where the cpu flavor is known at build time.
ARMv5 is the baseline for the Android ARM ABI, which might be the main reason why anybody would care. In practice I'm not sure if any such devices actually have shipped - at least the first devices actually were ARMv6.
If in practice, ARMv5 isn't used, I think we can ignore this for now (and simply disable fat when nettle is configured for pre-v6 ARM).
Regards, /Niels
On Tue, 2013-12-17 at 21:42 +0100, Niels Möller wrote:
ARMv5 is the baseline for the Android ARM ABI, which might be the main reason why anybody would care. In practice I'm not sure if any such devices actually have shipped - at least the first devices actually were ARMv6.
If in practice, ARMv5 isn't used, I think we can ignore this for now (and simply disable fat when nettle is configured for pre-v6 ARM).
I tried to use the android NDK on nettle and the cpu detected was simply 'arm'. It seems that the android NDK compiler compiles with -march=arv5te, so I don't think that separating armv5 and armv6 would work (for android at least).
regards, Nikos
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
I tried to use the android NDK on nettle and the cpu detected was simply 'arm'.
You could configure with something like
--host=armv6-linux-androideabi CC="arm-linux-androideabi -march=armv6t2"
No idea how much problems, if any, that will cause for real android devices.
But it seems fairly common that android apps are compiled for newer processors only (if they include native code at all), e.g., I think the firefox app I got from f-droid is armv7 only.
Regards, /Niels
On Wed, 18 Dec 2013, Niels Möller wrote:
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
I tried to use the android NDK on nettle and the cpu detected was simply 'arm'.
You could configure with something like
--host=armv6-linux-androideabi CC="arm-linux-androideabi -march=armv6t2"
No idea how much problems, if any, that will cause for real android devices.
But it seems fairly common that android apps are compiled for newer processors only (if they include native code at all), e.g., I think the firefox app I got from f-droid is armv7 only.
Yes - there's two separate ABIs that they support automatically, ARMv5 and ARMv7 (without neon) - you can build two separate versions of your library and the device will install the right one (and use the v5 one on v7 if there's no v7 specific binary). Since there are ARMv7 devices without neon, those features would have to be enabled at runtime though.
// Martin
On Wed, Dec 18, 2013 at 11:55 AM, Niels Möller nisse@lysator.liu.se wrote:
I tried to use the android NDK on nettle and the cpu detected was simply 'arm'.
You could configure with something like --host=armv6-linux-androideabi CC="arm-linux-androideabi -march=armv6t2" No idea how much problems, if any, that will cause for real android devices.
Most probably none, but developers on these devices compile with the flags provided by the BSP provider (in that case the NDK). Given what Martin says if we follow the split approach, on android ARMv6 systems we get the nettle's armv5 code, and only on ARMv7 ones we get the optimized code. Maybe splitting the arm code isn't that good idea.
regards, Nikos
nettle-bugs@lists.lysator.liu.se