Currently ghash/gcm performance on arm in both gcrypt and nettle is a bit abysmal: === bench-slopes-nettle === GCM auth | 28.43 ns/B 33.54 MiB/s 39.81 c/B 1400.2 === bench-slopes-gcrypt === GCM auth | 21.86 ns/B 43.62 MiB/s 30.52 c/B 1396.0 === bench-slopes-openssl [1.1.1a] === GCM auth | 5.99 ns/B 159.3 MiB/s 8.38 c/B 1399.6 === cut === Current openssl/cryptograms code is based on ideas from https://hal.inria.fr/hal-01506572 (licensed CC BY 4.0) and there are linked implementation https://conradoplg.cryptoland.net/software/ecc-and-ae-for-arm-neon/ (licensed LGPL 2.1+), which I guess should be acceptable to borrow.
Very preliminary patch for nettle will be posted as reply (passes nettle regression test, but needs more extensive testing); === bench-slopes-nettle [w/ patched nettle 3.3] === aes128 | nanosecs/byte mebibytes/sec cycles/byte GCM auth | 7.07 ns/B 134.9 MiB/s 9.90 c/B === cut === (And not only it is notably faster, it should be completely free of all cache/timing leaks).