On Wed, Mar 31, 2021 at 9:18 PM Niels Möller nisse@lysator.liu.se wrote:
The reason it makes sense to me to split aes-encrypt.c, is that:
(i) It's more consistent with the other aes-related functions.
(ii) The current aes-encrypt.c contains both the encryption functions aes128_encrypt, aes192_encrypt, aes256_encrypt, which we'd want to override with assembly implementations, and the legacy wrapper function aes_encrypt, which shouldn't be overridden. So we can't use plain file-level override, but need #ifdefs too.
(iii) I've considered doing it earlier, to make it easier to implement aes without a round loop (like for all current versions of aes-encrypt-internal.*). E.g., on x86_64, for aes128 we could load all subkeys into registers and still have registers left to do two or more blocks in parallel, but then we'd need to override aes128_encrypt separately from the other aes*_encrypt.
I've tried out a split, see below patch. It's a rather large change, moving pieces to new places, but nothing difficult. I'm considering committing this to the s390x branch, what do you think?
I agree, I'll modify the patch of basic AES-128 optimized functions to be built on top of the splitted aes functions.
Regarding the large number of functions for s390x, I'm not yet convinced
we should have all of them, we'll have to consider the tradeoff between speedup and complexity case by case. In particular, cbc encrypt (but not decrypt!) is notoriously slow, since it's inherently serial. So I'm curious about potential speedup there.
Before getting too far, it may also be worthwhile to try out an assembly
memxor.
memxor performs the same in C and assembly since s390 architecture offers memory xor instruction "xc" see xor_len macro in machine.m4 of the original patch for an implementation example. However, s390x AES accelerators offer considerable speedup against C implementation with optimized internal AES. The following table demonstrates the idea more clearly:
Function S390x accelerator C implementation with optimized internal AES (Only enable aes128.asm, aes192.asm, aes256.asm) ------------------------------------------------------------------------------------------------------------------------------- CBC AES128 Encrypt 1.073569 cpb 13.674891 cpb CBC AES128 Decrypt 0.647008 cpb 3.131405 cpb CBC AES192 Encrypt 1.266316 cpb 13.183552 cpb CBC AES192 Decrypt 0.622058 cpb 3.074917 cpb CBC AES256 Encrypt 1.450422 cpb 14.380789 cpb CBC AES256 Decrypt 0.648403 cpb 3.040746 cpb CFB AES128 Encrypt 1.199716 cpb 15.116906 cpb CFB AES128 Decrypt 1.205567 cpb 3.144538 cpb CFB AES192 Encrypt 1.393276 cpb 15.340453 cpb CFB AES192 Decrypt 1.415399 cpb 3.064844 cpb CFB AES256 Encrypt 1.687762 cpb 15.876734 cpb CFB AES256 Decrypt 1.677147 cpb 3.065851 cpb CFB8 AES128 Encrypt 17.278379 cpb 178.117195 cpb CFB8 AES128 Decrypt 17.327002 cpb 183.136198 cpb CFB8 AES192 Encrypt 20.408311 cpb 184.028411 cpb CFB8 AES192 Decrypt 20.397928 cpb 187.534654 cpb CFB8 AES256 Encrypt 23.549944 cpb 184.800598 cpb CFB8 AES256 Decrypt 23.367348 cpb 190.355030 cpb CMAC AES128 Update 1.026380 cpb 12.108085 cpb CMAC AES256 Update 1.399747 cpb 11.497727 cpb CCM AES128 Encrypt 1.828593 cpb 15.332434 cpb CCM AES128 Decrypt 1.691520 cpb 14.115167 cpb CCM AES128 Update 1.027736 cpb 10.918015 cpb CCM AES192 Encrypt 1.883996 cpb 15.840703 cpb CCM AES192 Decrypt 1.950362 cpb 14.478925 cpb CCM AES192 Update 1.213858 cpb 11.239195 cpb CCM AES256 Encrypt 2.206957 cpb 15.861586 cpb CCM AES256 Decrypt 2.311447 cpb 15.051353 cpb CCM AES256 Update 1.404938 cpb 11.441472 cpb CTR AES128 Crypt 0.710237 cpb 4.767290 cpb CTR AES192 Crypt 0.635386 cpb 3.489661 cpb CTR AES256 Crypt 0.628296 cpb 3.138727 cpb XTS AES128 Encrypt 0.655454 cpb 15.757406 cpb XTS AES128 Decrypt 0.656113 cpb 15.920863 cpb XTS AES256 Encrypt 0.663048 cpb 16.689253 cpb XTS AES256 Decrypt 0.676298 cpb 16.670889 cpb GCM AES128 Encrypt 0.630504 cpb 15.473187 cpb GCM AES128 Decrypt 0.627714 cpb 15.529209 cpb GCM AES128 Update 0.514662 cpb 11.608726 cpb GCM AES192 Encrypt 0.642785 cpb 15.245804 cpb GCM AES192 Decrypt 0.631627 cpb 15.511039 cpb GCM AES192 Update 0.499630 cpb 11.745876 cpb GCM AES256 Encrypt 0.631046 cpb 15.400776 cpb GCM AES256 Decrypt 0.622329 cpb 15.419954 cpb GCM AES256 Update 0.499630 cpb 11.569789 cpb
Also, the optimized AES cores for s390x could serve as a good reference for other crypto libraries since they have clean and well-documented assembly implementation. The only drawback I can see is spamming preprocessor conditions in C files of AES modes to support fat build for those accelerators which is worth it IMO considering the speed gain we get.
regards, Mamone