On Tue, 2012-11-27 at 22:47 +0100, Niels Möller wrote:
Fredrik Thulin fredrik@thulin.net writes:
On Tue, 2012-11-27 at 14:06 +0100, Niels Möller wrote:
Cool. Why SHA512 rather than SHA256, is using it specified somewhere?
I wanted maximum speed on 64 bits CPUs without settling for SHA-1, but I won't claim this was a particularly well illuminated decision.
Makes some sense. But on the other hand, the point of pbkdf2 is to be slow (for the attacker), so selecting a faster hash function just means that you need to use a larger iteration count...
Right - I realized it was poorly phrased after I sent it.
To be a bit more elaborate, my reasoning is that to minimize an attackers advantage (bot nets will always have more CPU than I do), I have to choose an algorithm that is as fast as possible for me and use it with as many iterations as I can afford. I know I will have 64 bits, but not yet if I should go with in Intel or AMD here.
On top of that, it doesn't hurt to choose an algorithm that might me more expensive for attackers than me. AFAICT, most GPUs today are 32 bits (so SHA512 would be the slowest SHA currently), although I guess it is safe to assume GPUs will become 64 bits too.
Naturally, what I'm building will be upgradeable with new algorithms and/or iteration counts over time, but SHA512 seems like the strongest choice today.
The only reason *not* to go with SHA512 (if we're limiting ourselves to SHAs) seems to be that it is considered overkill - admittedly by people that know far more than me about secure hashes. I don't mind the overkill though, and prefer safe to sorry.
For lack of more authoritative test vectors, adding a couple of testvectors generated by python-pbkdf2, to the nettle testsuite would be nice.
I'll see what I can do.
Haven't measured. Optimizing a SHA512 implementation is really above my head, but I've heard talks about using AMD XOP instruction set to optimize SHA512 on other mailing lists...
That's a project for another day. It's some time since I wrote the C implementation, and I can't guess if a clever assembly implementation would gain a 10% or a 100% speedup compared to what gcc generates.
An indication could perhaps be gleaned from
http://cvs.openssl.org/chngview?cn=22648
If I understand the "Current performance in cycles per processed byte" data correctly it seems like use of AVX/XOP could give a 50% performance improvement for SHA256 and a 38% improvement for SHA512 on Sandy Bridge. I have no idea how OpenSSLs previous implementation performed compared to Nettle though.
...
I get
PBKDF2-HMAC-SHA512 benchmark result :
... N= 16384 -> Python == 1834 ms, Nettle == 59 ms
The machine has an "AMD E-350" processor, 1.6 GHz dual core (but I guess the number of cores doesn't matter here). GMP's configure refers to the cpu as "bobcat", which if I understand these things correctly is AMD's current low-end.
Thanks. I know not to use a low-end AMD then ;). You are correct that the benchmark only uses one core.
/Fredrik