Some things I fixed, when having a look at the sources.
Just two points for code reviewing:
1.
#define NONNULL(...) __attribute__ ((nonnull(__VA_ARGS__)))
is contains a C99 feature (...), but there are also C99 long long constants
somewhere in the code (if you mind C89 compliancy).
2.
In cast128.c I removed the wiping of t, l and r. Instead I set t=0 at the
beginning of the loops (It seemed to be used uninitialized in F1 macro).
Please just have a short look into it - maybe the "wiping" has some
undocumented deeper meaning !?
Regards, Tim