On Wed, Aug 02, 2017 at 04:25:42PM +0200, Niels Möller wrote:
"Daniel P. Berrange" berrange@redhat.com writes:
Naively I would have expected them all to be pretty much equal given that they're delegating to the same hardware routines.
This is not completely unexpected.
- Nettle's AESNI assembly routines were written for simplicity and small code size, without putting a lot of effort into it. They could probably be sped up by some unrolling or more careful instruction scheduling. Patches welcome (but we shouldn't use excessive unrolling unless there's a significant speedup).
Unfortunately I don't have any useful expertize in asm code, so I won't be able to provide any patches in this area.
- Nettle's AES-CBC uses general CBC functions invoking the AES encrypt and decrypt functions. In particular for CBC *en*crypt, this adds significant overhead for function calls, and the memxor function will examine src/dst alignment once per block. CBC *de*crypt is usually a bit faster, since we can then decrypt more than one block at a time.
comparative benchmarks of nettle's impl against others & seen the same kind of results ?
You can also try the ./examples/nettle-benchmark program; if openssl was found at configure time, it includes benchmarks of some openssl functions for comparison.
FYI, that benchmark program is somewhat misleading, because it directly uses the openssl AES APIs, which always go to the generic software version, and thus make openssl look real slow by comparison. To exercise the AESNI impls in openssl, it would need to be rewritten to use the openssl EVP APIs which dynamically choose the best impl.
Regards, Daniel