Hello, As it is now the ARM-neon and v6 optimizations are enabled when nettle is compiled on the exact system it is supposed to run. This is very unlikely to happen in some scenarios, as applications intended to run on an arm mobile (eg [0]) cannot know in advance the target. That is a petty because nettle is pretty unique in having support for assembly in ARMv6 or neon.
A solution to that issue would be to have a nettle library constructor that runs the equivalent of cpuid in ARM, and stores it to a global variable. Then each assembly module (e.g., aes-arm) will jump to the correct implementation detected at runtime.
This requires no global initialization functions or anything like that. In the systems where library constructors are not available (i.e., static linking), the cost would be running the oldest arm assembly version (unless the constructor is called explicitly).
This is similar to what openssl is doing already (with the exception that they use a global initialization function instead of a constructor). To me that seems quite a simple way to improve the current situation.
regards, Nikos