I made a merge request for optimizing AES modes on s390x architecture, the patch implements the optimized cores using cipher instructions that have been added to s390x arch in message security assist extensions. The patch uses the following functions: KM-AES-128 (ECB-AES128) KM-AES-192 (ECB-AES192) KM-AES-256 (ECB-AES256) KMC-AES-128 (CBC-AES128) KMC-AES-192 (CBC-AES192) KMC-AES-256 (CBC-AES256) KMAC-AES-128 (CCM-AES128, CMAC-AES128) KMAC-AES-192 (CCM-AES192) KMAC-AES-256 (CCM-AES256, CMAC-AES256) KMF-AES-128 (CFB-AES128, CFB8-AES128) KMF-AES-192 (CFB-AES192, CFB8-AES192) KMF-AES-256 (CFB-AES256, CFB8-AES256) KM-XTS-AES-128 (XTS-AES128) KM-XTS-AES-256 (XTS-AES256) KIMD-GHASH (GHASH) KMCTR-AES-128, KMA-GCM-AES-128 (CTR-AES128) KMCTR-AES-192, KMA-GCM-AES-192 (CTR-AES192) KMCTR-AES-256, KMA-GCM-AES-256 (CTR-AES256) KMA-GCM-AES-128 (GCM-AES128) KMA-GCM-AES-192 (GCM-AES192) KMA-GCM-AES-256 (GCM-AES256)
The merge request has also a benchmark that measures the speed of optimized cores on s390x arch.
I can't set up gitlab CI for automatic testing on s390x arch because qemu hasn't implemented cipher functions for this arch. However, there is an easy way to test the patch manually by requesting a free account on the LinuxONE Community Cloud, both short-term and long-term access are available. https://linuxone.cloud.marist.edu/#/register?flag=VM
regards, Mamone
Hello Neils,
Any update on this? Is there anything missed in my side?
regards, Mamone
On Tue, Jan 5, 2021 at 1:12 AM Maamoun TK maamoun.tk@googlemail.com wrote:
I made a merge request for optimizing AES modes on s390x architecture, the patch implements the optimized cores using cipher instructions that have been added to s390x arch in message security assist extensions. The patch uses the following functions: KM-AES-128 (ECB-AES128) KM-AES-192 (ECB-AES192) KM-AES-256 (ECB-AES256) KMC-AES-128 (CBC-AES128) KMC-AES-192 (CBC-AES192) KMC-AES-256 (CBC-AES256) KMAC-AES-128 (CCM-AES128, CMAC-AES128) KMAC-AES-192 (CCM-AES192) KMAC-AES-256 (CCM-AES256, CMAC-AES256) KMF-AES-128 (CFB-AES128, CFB8-AES128) KMF-AES-192 (CFB-AES192, CFB8-AES192) KMF-AES-256 (CFB-AES256, CFB8-AES256) KM-XTS-AES-128 (XTS-AES128) KM-XTS-AES-256 (XTS-AES256) KIMD-GHASH (GHASH) KMCTR-AES-128, KMA-GCM-AES-128 (CTR-AES128) KMCTR-AES-192, KMA-GCM-AES-192 (CTR-AES192) KMCTR-AES-256, KMA-GCM-AES-256 (CTR-AES256) KMA-GCM-AES-128 (GCM-AES128) KMA-GCM-AES-192 (GCM-AES192) KMA-GCM-AES-256 (GCM-AES256)
The merge request has also a benchmark that measures the speed of optimized cores on s390x arch.
I can't set up gitlab CI for automatic testing on s390x arch because qemu hasn't implemented cipher functions for this arch. However, there is an easy way to test the patch manually by requesting a free account on the LinuxONE Community Cloud, both short-term and long-term access are available. https://linuxone.cloud.marist.edu/#/register?flag=VM
regards, Mamone
Maamoun TK maamoun.tk@googlemail.com writes:
Any update on this? Is there anything missed in my side?
I'm a bit concerned about testing, and missing qemu support. I guess we'll have to do with manual tests, but I won't be able to do that regularly myself. Sorry for the delay.
Du you think it would be useful to setup a ci build, even if it has to disable the asm code at compile time or runtime? That would at least give some coverage to configure and fat logic.
I would also prefer to get the basic AES functions in (aes128.asm, aes192.asm, aes256.asm), before considering combined aes-mode functions.
Regards, /Niels
On Thu, Jan 21, 2021 at 2:53 AM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
Any update on this? Is there anything missed in my side?
I'm a bit concerned about testing, and missing qemu support. I guess we'll have to do with manual tests, but I won't be able to do that regularly myself. Sorry for the delay.
The Nettle project can access the LinuxONE Community Cloud at Marist for the long-term to run manual or automated CI testing. Jenkins and Travis CI are available, in addition to anything that you want to configure in your instance.
QEMU or emulators on non-IBM hardware are not an option for these features.
Thanks, David
Thanks for the info, I'll see how I can integrate LinuxONE Community Cloud to nettle CI for automated testing.
regards, Mamone
On Thu, Jan 21, 2021 at 3:56 PM David Edelsohn dje.gcc@gmail.com wrote:
On Thu, Jan 21, 2021 at 2:53 AM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
Any update on this? Is there anything missed in my side?
I'm a bit concerned about testing, and missing qemu support. I guess we'll have to do with manual tests, but I won't be able to do that regularly myself. Sorry for the delay.
The Nettle project can access the LinuxONE Community Cloud at Marist for the long-term to run manual or automated CI testing. Jenkins and Travis CI are available, in addition to anything that you want to configure in your instance.
QEMU or emulators on non-IBM hardware are not an option for these features.
Thanks, David
I managed to integrate an instance of LinuxONE Community Cloud to nettle CI, I can't make a merge request for it because it has manual steps so I'll write a guide for it. The integration process is pretty straightforward and it can be done by following these steps: - Create a free account here https://linuxone.cloud.marist.edu/#/register?flag=VM and make an instance (All instances are z15 so it supports all the current implemented features). - run the following commands in the instance:
- mkdir nettle && cd nettle - git init - git remote add origin https://gitlab.com/nettle/nettle.git
- In gitlab go to settings -> CI / CD. Expand Variables and add variable, Key: SSH_PRIVATE_KEY, Value: (Set the private key here) - Update gitlab-ci.yml as follows (assisted by this recip https://medium.com/@hfally/a-gitlab-ci-config-to-deploy-to-your-server-via-s... ):
- Add this line to variables category:
DEBIAN_BUILD: buildenv-debian
- Add these lines to the end of file
Debian.remote.s390x: image: $CI_REGISTRY/$BUILD_IMAGES_PROJECT:$DEBIAN_BUILD before_script: - apt-get update -qq - apt-get install -qq git - 'which ssh-agent || ( apt-get install -qq openssh-client )' - eval $(ssh-agent -s) - ssh-add <(echo "$SSH_PRIVATE_KEY") - mkdir -p ~/.ssh - echo -e "Host *\n\tStrictHostKeyChecking no\n\n" > ~/.ssh/config script: - ssh linux1@IP_ADDRESS "cd nettle && git pull origin s390x --rebase && ./.bootstrap && ./configure --enable-fat && make && make check && exit" tags: - shared - linux except: - tags
Note: Replace IP_ADDRESS with ip address of instance.
On Thu, Jan 21, 2021 at 5:05 PM Maamoun TK maamoun.tk@googlemail.com wrote:
Thanks for the info, I'll see how I can integrate LinuxONE Community Cloud to nettle CI for automated testing.
regards, Mamone
On Thu, Jan 21, 2021 at 3:56 PM David Edelsohn dje.gcc@gmail.com wrote:
On Thu, Jan 21, 2021 at 2:53 AM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
Any update on this? Is there anything missed in my side?
I'm a bit concerned about testing, and missing qemu support. I guess we'll have to do with manual tests, but I won't be able to do that regularly myself. Sorry for the delay.
The Nettle project can access the LinuxONE Community Cloud at Marist for the long-term to run manual or automated CI testing. Jenkins and Travis CI are available, in addition to anything that you want to configure in your instance.
QEMU or emulators on non-IBM hardware are not an option for these features.
Thanks, David
Hi, Maamoun
Thanks for setting this up. The default accounts have a limited time (90 days?). For long-term CI access, I can help request a long-term account for Nettle.
Thanks, David
On Thu, Jan 21, 2021 at 5:05 PM Maamoun TK maamoun.tk@googlemail.com wrote:
I managed to integrate an instance of LinuxONE Community Cloud to nettle CI, I can't make a merge request for it because it has manual steps so I'll write a guide for it. The integration process is pretty straightforward and it can be done by following these steps:
- Create a free account here https://linuxone.cloud.marist.edu/#/register?flag=VM and make an instance (All instances are z15 so it supports all the current implemented features).
- run the following commands in the instance:
mkdir nettle && cd nettle git init git remote add origin https://gitlab.com/nettle/nettle.git
- In gitlab go to settings -> CI / CD. Expand Variables and add variable, Key: SSH_PRIVATE_KEY, Value: (Set the private key here)
- Update gitlab-ci.yml as follows (assisted by this recip https://medium.com/@hfally/a-gitlab-ci-config-to-deploy-to-your-server-via-s...):
Add this line to variables category:
DEBIAN_BUILD: buildenv-debian
Add these lines to the end of file
Debian.remote.s390x: image: $CI_REGISTRY/$BUILD_IMAGES_PROJECT:$DEBIAN_BUILD before_script:
- apt-get update -qq
- apt-get install -qq git
- 'which ssh-agent || ( apt-get install -qq openssh-client )'
- eval $(ssh-agent -s)
- ssh-add <(echo "$SSH_PRIVATE_KEY")
- mkdir -p ~/.ssh
- echo -e "Host *\n\tStrictHostKeyChecking no\n\n" > ~/.ssh/config
script:
- ssh linux1@IP_ADDRESS "cd nettle && git pull origin s390x --rebase && ./.bootstrap && ./configure --enable-fat && make && make check && exit"
tags:
- shared
- linux
except:
- tags
Note: Replace IP_ADDRESS with ip address of instance.
On Thu, Jan 21, 2021 at 5:05 PM Maamoun TK maamoun.tk@googlemail.com wrote:
Thanks for the info, I'll see how I can integrate LinuxONE Community Cloud to nettle CI for automated testing.
regards, Mamone
On Thu, Jan 21, 2021 at 3:56 PM David Edelsohn dje.gcc@gmail.com wrote:
On Thu, Jan 21, 2021 at 2:53 AM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
Any update on this? Is there anything missed in my side?
I'm a bit concerned about testing, and missing qemu support. I guess we'll have to do with manual tests, but I won't be able to do that regularly myself. Sorry for the delay.
The Nettle project can access the LinuxONE Community Cloud at Marist for the long-term to run manual or automated CI testing. Jenkins and Travis CI are available, in addition to anything that you want to configure in your instance.
QEMU or emulators on non-IBM hardware are not an option for these features.
Thanks, David
David Edelsohn dje.gcc@gmail.com writes:
Thanks for setting this up. The default accounts have a limited time (90 days?). For long-term CI access, I can help request a long-term account for Nettle.
That would be helpful.
I've had look at the terms and conditions, http://security.marist.edu/LinuxOne/TC.PDF. Most of it looks very reasonable, but there are a few items that I find a bit unclear:
9. [...] You agree to obey all relevant New York State and US laws, including all export controls laws.
My understanding is that US export control laws don't apply to FOSS software (and that's why, e.g., Debian no longer have special non-us mirrors for distributing cryptographic software). But I don't know the details, and if there really isn't a problem, why is it mentioned explicitly in the terms and conditions?
10 [...] d. To protect your LinuxOne Account, keep your Secure Shell (SSH) keys confidential. You are responsible for the activity that happens on or through your LinuxOne Account.
Is it acceptable under these terms if I upload a private key to a CI config that is part of the gnutls project hosted on gitlab.com? Maamoun's suggested method was to add it as a "Variable" in the CI/CD web config, I'm assuming that will not make it publicly visible (but I'd need to double check).
I don't know precisely which individuals will get access to use the key (and hence my account) if I do that, even though I expect it to be small number of good people (admins of the gnutls project, and the key will also be technically accessible by gitlab staff).
[...] Do not reuse your LinuxOne Account keys on third-party applications.
I also don't understand what "third-party applications" means in this context, but I'd guess gitlab could be one?
Regards, /Niels
On Wed, Feb 3, 2021 at 11:13 AM Niels Möller nisse@lysator.liu.se wrote:
... I've had look at the terms and conditions, http://security.marist.edu/LinuxOne/TC.PDF. Most of it looks very reasonable, but there are a few items that I find a bit unclear:
- [...] You agree to obey all relevant New York State and US laws, including all export controls laws.
My understanding is that US export control laws don't apply to FOSS software (and that's why, e.g., Debian no longer have special non-us mirrors for distributing cryptographic software). But I don't know the details, and if there really isn't a problem, why is it mentioned explicitly in the terms and conditions?
IBM is a US company. It has to comply with the export laws.
For an open source project you have to email the encryption coordinator (the NSA) with a link to the project's website and source files. Also see https://www.eff.org/deeplinks/2019/08/us-export-controls-and-published-encry....
Jeff
On Wed, Feb 3, 2021 at 11:13 AM Niels Möller nisse@lysator.liu.se wrote:
David Edelsohn dje.gcc@gmail.com writes:
Thanks for setting this up. The default accounts have a limited time (90 days?). For long-term CI access, I can help request a long-term account for Nettle.
That would be helpful.
I've had look at the terms and conditions, http://security.marist.edu/LinuxOne/TC.PDF. Most of it looks very reasonable, but there are a few items that I find a bit unclear:
- [...] You agree to obey all relevant New York State and US laws, including all export controls laws.
My understanding is that US export control laws don't apply to FOSS software (and that's why, e.g., Debian no longer have special non-us mirrors for distributing cryptographic software). But I don't know the details, and if there really isn't a problem, why is it mentioned explicitly in the terms and conditions?
I am not a lawyer and cannot give legal advice about any of this. I also cannot speak officially for IBM or Marist about the terms and conditions of agreements.
This hasn't been a problem for other Open Source projects, including Open Source cryptographic libraries.
You're not hosting development of the library in the U.S. nor distributing the library from the U.S., so you would seem to be obeying New York State and US laws. The U.S. does not restrict importation of cryptographic software. Downloading the library or repo into the system at Marist to run testing or CI is considered importing.
10 [...] d. To protect your LinuxOne Account, keep your Secure Shell (SSH) keys confidential. You are responsible for the activity that happens on or through your LinuxOne Account.
Is it acceptable under these terms if I upload a private key to a CI config that is part of the gnutls project hosted on gitlab.com? Maamoun's suggested method was to add it as a "Variable" in the CI/CD web config, I'm assuming that will not make it publicly visible (but I'd need to double check).
The item is not specifying how you handle the security and confidentiality of your keys, only that you are responsible for activity on your LinuxONE s390x instance. The intention is that you not email spam or hack other systems or run Bitcoin miners from your account, and make a reasonable effort that malicious parties cannot break into your LinuxONE instance to do similar bad things.
I don't know precisely which individuals will get access to use the key (and hence my account) if I do that, even though I expect it to be small number of good people (admins of the gnutls project, and the key will also be technically accessible by gitlab staff).
[...] Do not reuse your LinuxOne Account keys on third-party applications.
I also don't understand what "third-party applications" means in this context, but I'd guess gitlab could be one?
Again, I interpret this as basic key security: don't reuse keys or passwords on multiple accounts where a compromise of one account would allow an attacker to compromise other accounts, including the LinuxONE system. It didn't say that you couldn't use it, it said don't REuse it, such as, don't use the same key for LinuxONE and AWS and wherever else you run CI.
Thanks, David
David Edelsohn dje.gcc@gmail.com writes:
Thanks for setting this up. The default accounts have a limited time (90 days?). For long-term CI access, I can help request a long-term account for Nettle.
Hi, I set up the s390x vm for Nettle ci tests late March. What information do you need to arrange an extension to long-term access, so it doesn't expire?
Regards, /Niels
On Sat, May 8, 2021 at 2:24 PM Niels Möller nisse@lysator.liu.se wrote:
David Edelsohn dje.gcc@gmail.com writes:
Thanks for setting this up. The default accounts have a limited time (90 days?). For long-term CI access, I can help request a long-term account for Nettle.
Hi, I set up the s390x vm for Nettle ci tests late March. What information do you need to arrange an extension to long-term access, so it doesn't expire?
With what email address is the account associated? With the same one as this email message? nisse (at) lysator?
Thanks, David
Maamoun TK maamoun.tk@googlemail.com writes:
Debian.remote.s390x: image: $CI_REGISTRY/$BUILD_IMAGES_PROJECT:$DEBIAN_BUILD before_script:
- apt-get update -qq
- apt-get install -qq git
- 'which ssh-agent || ( apt-get install -qq openssh-client )'
- eval $(ssh-agent -s)
- ssh-add <(echo "$SSH_PRIVATE_KEY")
- mkdir -p ~/.ssh
- echo -e "Host *\n\tStrictHostKeyChecking no\n\n" > ~/.ssh/config
script:
- ssh linux1@IP_ADDRESS "cd nettle && git pull origin s390x --rebase &&
./.bootstrap && ./configure --enable-fat && make && make check && exit" tags:
- shared
- linux
except:
- tags
It looks like this hardcodes the branch to test ("s390x"), while the ci jobs usually runs on all branches. It also doesn't clean up the remote state between builds.
I wonder if it would be more reliable to run make dist PACKAGE_VERSION=snapshot on the ci build machine, and copy the resulting tarball to the remote machine for build and test. The commands run on the remote machine should unpack the snapshot in a fresh directory, run configure, make, make check.
Regards, /Niels
On Wed, Feb 3, 2021 at 5:47 PM Niels Möller nisse@lysator.liu.se wrote:
It looks like this hardcodes the branch to test ("s390x"), while the ci jobs usually runs on all branches. It also doesn't clean up the remote state between builds.
I wonder if it would be more reliable to run make dist PACKAGE_VERSION=snapshot on the ci build machine, and copy the resulting tarball to the remote machine for build and test. The commands run on the remote machine should unpack the snapshot in a fresh directory, run configure, make, make check.
I figured an approach that test the branch in which the changes are committed:
Debian.remote.s390x: image: $CI_REGISTRY/$BUILD_IMAGES_PROJECT:$DEBIAN_BUILD before_script: - apt-get update -qq - apt-get install -qq git - 'which ssh-agent || ( apt-get install -qq openssh-client )' - eval $(ssh-agent -s) - ssh-add <(echo "$SSH_PRIVATE_KEY") - mkdir -p ~/.ssh - echo -e "Host *\n\tStrictHostKeyChecking no\n\n" > ~/.ssh/config - ssh linux1@IP_ADDRESS "mkdir -p nettle_ci/$CI_PIPELINE_IID" script: - ssh linux1@IP_ADDRESS "cd nettle_ci/$CI_PIPELINE_IID && git clone --depth=1 --branch $CI_COMMIT_REF_NAME https://gitlab.com/gnutls/nettle.git . && ./.bootstrap && ./configure --disable-documentation && make && make check" after_script: - eval $(ssh-agent -s) - ssh-add <(echo "$SSH_PRIVATE_KEY") - ssh linux1@IP_ADDRESS "rm -rf nettle_ci/$CI_PIPELINE_IID/ && exit" tags: - shared - linux except: - tags
It used CI_PIPELINE_IID to make a new directory with a unique name to safely handle job race conditions in case pushing many commits quickly. According to https://docs.gitlab.com/ee/ci/variables/predefined_variables.html CI_PIPELINE_IID is unique for the current project so it's ok to use it in this context. After the job is completed, the directory with a unique name is removed for cleaning up. However, this approach has a downside, if more than one commit is pushed quickly, all the created pipelines may check the latest commit, not the corresponding ones. We can solve this issue by using CI_COMMIT_SHA predefined environment variable and commands like git reset --hard $CI_COMMIT_SHA or git checkout $CI_COMMIT_SHA. However, I haven't tested any and am not sure if it's worth it.
Maamoun's suggested method was to add it as a "Variable" in the CI/CD
web config, I'm assuming that will not make it publicly visible (but I'd need to double check).
It looks like it doesn't expose the key publicly, it shows $SSH_PRIVATE_KEY name in the console windows without revealing any value, not sure if there other places where such things could be exploited or something.
regards, Mamone
I managed to get the tarball approach working in gitlab ci with the following steps:
- In gitlab go to settings -> CI / CD. Expand Variables and add the following variables:
- S390X_SSH_IP_ADDRESS: username@instance_ip - S390X_SSH_PRIVATE_KEY: private key of ssh connection - S390X_SSH_CI_DIRECTORY: name of directory in remote server where the tarball is extracted and tested
- Update gitlab-ci.yml as follows:
- Add this line to variables category at the top of file:
DEBIAN_BUILD: buildenv-debian
- Add these lines to the end of file
Debian.remote.s390x: image: $CI_REGISTRY/$BUILD_IMAGES_PROJECT:$DEBIAN_BUILD before_script: - apt-get update -qq - apt-get install -qq git - 'which ssh-agent || ( apt-get install -qq openssh-client )' - eval $(ssh-agent -s) - ssh-add <(echo "$S390X_SSH_PRIVATE_KEY") - mkdir -p ~/.ssh - echo -e "Host *\n\tStrictHostKeyChecking no\n\n" > ~/.ssh/config - ssh $S390X_SSH_IP_ADDRESS "mkdir -p $S390X_SSH_CI_DIRECTORY/$CI_PIPELINE_IID" script: - tar --exclude=.git --exclude=gitlab-ci.yml -cf - . | ssh $S390X_SSH_IP_ADDRESS "cd $S390X_SSH_CI_DIRECTORY/$CI_PIPELINE_IID && tar -xf - && ./.bootstrap && ./configure --disable-documentation && make && make check" after_script: - eval $(ssh-agent -s) - ssh-add <(echo "$S390X_SSH_PRIVATE_KEY") - ssh $S390X_SSH_IP_ADDRESS "rm -rf $S390X_SSH_CI_DIRECTORY/$CI_PIPELINE_IID/ && exit" only: variables: - $S390X_SSH_IP_ADDRESS - $S390X_SSH_PRIVATE_KEY - $S390X_SSH_CI_DIRECTORY tags: - shared - linux except: - tags
This approach archives the repo files and extracts the tar in remote server in order to be built and tested. It creates a directory with a unique name in the remote server for every pipeline, also if one of the required variables is not present (defined) the job is not created, with that said fresh forks wouldn't have s390x job unless they define the s390x specific variables.
regards, Mamone
Maamoun TK maamoun.tk@googlemail.com writes:
I managed to get the tarball approach working in gitlab ci with the following steps:
Thanks for the research. I've added a test job based on these ideas. See https://git.lysator.liu.se/nettle/nettle/-/commit/c25774e230985a625fa5112f3f.... An almost identical setup was run successfully as https://gitlab.com/gnutls/nettle/-/jobs/1125145345.
- In gitlab go to settings -> CI / CD. Expand Variables and add the
following variables:
- S390X_SSH_IP_ADDRESS: username@instance_ip
- S390X_SSH_PRIVATE_KEY: private key of ssh connection
- S390X_SSH_CI_DIRECTORY: name of directory in remote server where the
tarball is extracted and tested
I made only the private key a variable (and of type "file", which means it's stored in a temporary file, with file name in $SSH_PRIVATE_KEY). The others are defined in the .gitlab-ci.yml file.
Update gitlab-ci.yml as follows:
- Add this line to variables category at the top of file:
DEBIAN_BUILD: buildenv-debian
I used the same fedora image as for the simpler build jobs.
script:
- tar --exclude=.git --exclude=gitlab-ci.yml -cf - . | ssh
$S390X_SSH_IP_ADDRESS "cd $S390X_SSH_CI_DIRECTORY/$CI_PIPELINE_IID && tar -xf - &&
I'm using ./configure && make dist instead, then we get a bit testing of that too. On the remote side, directory name is based on $CI_PIPELINE_IID, that seems to be a good way to get one directory per job.
only: variables: - $S390X_SSH_IP_ADDRESS - $S390X_SSH_PRIVATE_KEY - $S390X_SSH_CI_DIRECTORY
What does this mean? Ah, it excludes the job if these variables aren't set?
Regards, /Niels
On Wed, Mar 24, 2021 at 8:52 PM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
- S390X_SSH_IP_ADDRESS: username@instance_ip
- S390X_SSH_PRIVATE_KEY: private key of ssh connection
- S390X_SSH_CI_DIRECTORY: name of directory in remote server where the
tarball is extracted and tested
I made only the private key a variable (and of type "file", which means it's stored in a temporary file, with file name in $SSH_PRIVATE_KEY). The others are defined in the .gitlab-ci.yml file.
Isn't it better to define S390X_SSH_IP_ADDRESS variable rather than hard-code the remote server address in .gitlab-ci.yml? fresh forks now need to update .gitlab-ci.yml to get a S390x job which is a bit unwieldy in my opinion.
Update gitlab-ci.yml as follows:
- Add this line to variables category at the top of file:
DEBIAN_BUILD: buildenv-debian
I used the same fedora image as for the simpler build jobs.
Good.
only: variables: - $S390X_SSH_IP_ADDRESS - $S390X_SSH_PRIVATE_KEY - $S390X_SSH_CI_DIRECTORY
What does this mean? Ah, it excludes the job if these variables aren't set?
Yes, this is what it does according to gitlab ci docs https://docs.gitlab.com/ee/ci/yaml/#onlyexcept-basic. otherwise, fresh forks will have always-unsuccessful job.
regards, Mamone
Maamoun TK maamoun.tk@googlemail.com writes:
Isn't it better to define S390X_SSH_IP_ADDRESS variable rather than hard-code the remote server address in .gitlab-ci.yml? fresh forks now need to update .gitlab-ci.yml to get a S390x job which is a bit unwieldy in my opinion.
Makes sense. I've added it as a variable, and renamed to S390X_ACCOUNT. Value is of the form username@ip-address.
Yes, this is what it does according to gitlab ci docs https://docs.gitlab.com/ee/ci/yaml/#onlyexcept-basic. otherwise, fresh forks will have always-unsuccessful job.
Ok, added a section
only: variables: - $SSH_PRIVATE_KEY - $S390X_ACCOUNT
Still on the master-updates branch, will merge as soon as the run looks green.
Regards, /Nies
Now that testing is up, we can return to the code.
I'd prefer to do things incrementally. I've created a branch "s390x", with basic configure setup and the README file from your merge request.
Next, I think it makes sense to start with adding the basic aes functions. From a quick look at the MR, it seems the aes instructions don't want any explicit key schedule with expanded subkeys, but wants the raw key? Same for encrypt and decrypt?
It would make sense to me with one file each under s390x/msa_x1 for the functions being replaced, but then the current aes-encrypt.c would also need to be split accordingly.
Regards, /Niels
On Sun, Mar 28, 2021 at 9:04 PM Niels Möller nisse@lysator.liu.se wrote:
Next, I think it makes sense to start with adding the basic aes functions.
I'll prepare a MR for the basic aes functions to the s390x branch.
From a quick look at the MR, it seems the aes instructions don't want any explicit key schedule with expanded subkeys, but wants the raw key? Same for encrypt and decrypt?
Correct, both encrypt and decrypt operations just need the raw key, the key schedule is executed internally.
It would make sense to me with one file each under s390x/msa_x1 for the functions being replaced, but then the current aes-encrypt.c would also need to be split accordingly.
Splitting key functions and ciphering functions for the basic AES implementation is reasonable but splitting encrypting and decrypting functions are a lot of files considering how many functions are implemented for S390x arch. I don't get why aes-encrypt.c need to be split accordingly tho, I think it would work just fine.
Regards, Mamone
On Sun, Mar 28, 2021 at 10:28 PM Maamoun TK maamoun.tk@googlemail.com wrote:
I'll prepare a MR for the basic aes functions to the s390x branch.
I made a MR that implements the AES-128 set key functions and the basic AES-128 functions for S390x architecture (one file for each).
regards, Mamone
Maamoun TK maamoun.tk@googlemail.com writes:
Splitting key functions and ciphering functions for the basic AES implementation is reasonable but splitting encrypting and decrypting functions are a lot of files considering how many functions are implemented for S390x arch. I don't get why aes-encrypt.c need to be split accordingly tho, I think it would work just fine.
Also discussed on the MR. The reason it makes sense to me to split aes-encrypt.c, is that:
(i) It's more consistent with the other aes-related functions.
(ii) The current aes-encrypt.c contains both the encryption functions aes128_encrypt, aes192_encrypt, aes256_encrypt, which we'd want to override with assembly implementations, and the legacy wrapper function aes_encrypt, which shouldn't be overridden. So we can't use plain file-level override, but need #ifdefs too.
(iii) I've considered doing it earlier, to make it easier to implement aes without a round loop (like for all current versions of aes-encrypt-internal.*). E.g., on x86_64, for aes128 we could load all subkeys into registers and still have registers left to do two or more blocks in parallel, but then we'd need to override aes128_encrypt separately from the other aes*_encrypt.
I've tried out a split, see below patch. It's a rather large change, moving pieces to new places, but nothing difficult. I'm considering committing this to the s390x branch, what do you think?
Regarding the large number of functions for s390x, I'm not yet convinced we should have all of them, we'll have to consider the tradeoff between speedup and complexity case by case. In particular, cbc encrypt (but not decrypt!) is notoriously slow, since it's inherently serial. So I'm curious about potential speedup there. Before getting too far, it may also be worthwhile to try out an assembly memxor.
diff --git a/Makefile.in b/Makefile.in index 868afdd7..8d474d1e 100644 --- a/Makefile.in +++ b/Makefile.in @@ -74,8 +74,10 @@ dvi installcheck uninstallcheck:
all-here: $(TARGETS) $(DOCTARGETS)
-nettle_SOURCES = aes-decrypt-internal.c aes-decrypt.c \ +nettle_SOURCES = aes-decrypt-internal.c aes-decrypt.c aes-decrypt-table.c \ + aes128-decrypt.c aes192-decrypt.c aes256-decrypt.c \ aes-encrypt-internal.c aes-encrypt.c aes-encrypt-table.c \ + aes128-encrypt.c aes192-encrypt.c aes256-encrypt.c \ aes-invert-internal.c aes-set-key-internal.c \ aes-set-encrypt-key.c aes-set-decrypt-key.c \ aes128-set-encrypt-key.c aes128-set-decrypt-key.c \ diff --git a/aes-decrypt-table.c b/aes-decrypt-table.c new file mode 100644 index 00000000..301020ee --- /dev/null +++ b/aes-decrypt-table.c @@ -0,0 +1,345 @@ +/* aes-decrypt-table.c + + Decryption function for aes/rijndael block cipher. + + Copyright (C) 2002, 2013 Niels Möller + + This file is part of GNU Nettle. + + GNU Nettle is free software: you can redistribute it and/or + modify it under the terms of either: + + * the GNU Lesser General Public License as published by the Free + Software Foundation; either version 3 of the License, or (at your + option) any later version. + + or + + * the GNU General Public License as published by the Free + Software Foundation; either version 2 of the License, or (at your + option) any later version. + + or both in parallel, as here. + + GNU Nettle is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received copies of the GNU General Public License and + the GNU Lesser General Public License along with this program. If + not, see http://www.gnu.org/licenses/. +*/ + +#if HAVE_CONFIG_H +# include "config.h" +#endif + +#include <stdlib.h> + +#include "aes-internal.h" + +const struct aes_table +_nettle_aes_decrypt_table = + { /* isbox */ + { + 0x52,0x09,0x6a,0xd5,0x30,0x36,0xa5,0x38, + 0xbf,0x40,0xa3,0x9e,0x81,0xf3,0xd7,0xfb, + 0x7c,0xe3,0x39,0x82,0x9b,0x2f,0xff,0x87, + 0x34,0x8e,0x43,0x44,0xc4,0xde,0xe9,0xcb, + 0x54,0x7b,0x94,0x32,0xa6,0xc2,0x23,0x3d, + 0xee,0x4c,0x95,0x0b,0x42,0xfa,0xc3,0x4e, + 0x08,0x2e,0xa1,0x66,0x28,0xd9,0x24,0xb2, + 0x76,0x5b,0xa2,0x49,0x6d,0x8b,0xd1,0x25, + 0x72,0xf8,0xf6,0x64,0x86,0x68,0x98,0x16, + 0xd4,0xa4,0x5c,0xcc,0x5d,0x65,0xb6,0x92, + 0x6c,0x70,0x48,0x50,0xfd,0xed,0xb9,0xda, + 0x5e,0x15,0x46,0x57,0xa7,0x8d,0x9d,0x84, + 0x90,0xd8,0xab,0x00,0x8c,0xbc,0xd3,0x0a, + 0xf7,0xe4,0x58,0x05,0xb8,0xb3,0x45,0x06, + 0xd0,0x2c,0x1e,0x8f,0xca,0x3f,0x0f,0x02, + 0xc1,0xaf,0xbd,0x03,0x01,0x13,0x8a,0x6b, + 0x3a,0x91,0x11,0x41,0x4f,0x67,0xdc,0xea, + 0x97,0xf2,0xcf,0xce,0xf0,0xb4,0xe6,0x73, + 0x96,0xac,0x74,0x22,0xe7,0xad,0x35,0x85, + 0xe2,0xf9,0x37,0xe8,0x1c,0x75,0xdf,0x6e, + 0x47,0xf1,0x1a,0x71,0x1d,0x29,0xc5,0x89, + 0x6f,0xb7,0x62,0x0e,0xaa,0x18,0xbe,0x1b, + 0xfc,0x56,0x3e,0x4b,0xc6,0xd2,0x79,0x20, + 0x9a,0xdb,0xc0,0xfe,0x78,0xcd,0x5a,0xf4, + 0x1f,0xdd,0xa8,0x33,0x88,0x07,0xc7,0x31, + 0xb1,0x12,0x10,0x59,0x27,0x80,0xec,0x5f, + 0x60,0x51,0x7f,0xa9,0x19,0xb5,0x4a,0x0d, + 0x2d,0xe5,0x7a,0x9f,0x93,0xc9,0x9c,0xef, + 0xa0,0xe0,0x3b,0x4d,0xae,0x2a,0xf5,0xb0, + 0xc8,0xeb,0xbb,0x3c,0x83,0x53,0x99,0x61, + 0x17,0x2b,0x04,0x7e,0xba,0x77,0xd6,0x26, + 0xe1,0x69,0x14,0x63,0x55,0x21,0x0c,0x7d, + }, + { /* itable */ + { + 0x50a7f451,0x5365417e,0xc3a4171a,0x965e273a, + 0xcb6bab3b,0xf1459d1f,0xab58faac,0x9303e34b, + 0x55fa3020,0xf66d76ad,0x9176cc88,0x254c02f5, + 0xfcd7e54f,0xd7cb2ac5,0x80443526,0x8fa362b5, + 0x495ab1de,0x671bba25,0x980eea45,0xe1c0fe5d, + 0x02752fc3,0x12f04c81,0xa397468d,0xc6f9d36b, + 0xe75f8f03,0x959c9215,0xeb7a6dbf,0xda595295, + 0x2d83bed4,0xd3217458,0x2969e049,0x44c8c98e, + 0x6a89c275,0x78798ef4,0x6b3e5899,0xdd71b927, + 0xb64fe1be,0x17ad88f0,0x66ac20c9,0xb43ace7d, + 0x184adf63,0x82311ae5,0x60335197,0x457f5362, + 0xe07764b1,0x84ae6bbb,0x1ca081fe,0x942b08f9, + 0x58684870,0x19fd458f,0x876cde94,0xb7f87b52, + 0x23d373ab,0xe2024b72,0x578f1fe3,0x2aab5566, + 0x0728ebb2,0x03c2b52f,0x9a7bc586,0xa50837d3, + 0xf2872830,0xb2a5bf23,0xba6a0302,0x5c8216ed, + 0x2b1ccf8a,0x92b479a7,0xf0f207f3,0xa1e2694e, + 0xcdf4da65,0xd5be0506,0x1f6234d1,0x8afea6c4, + 0x9d532e34,0xa055f3a2,0x32e18a05,0x75ebf6a4, + 0x39ec830b,0xaaef6040,0x069f715e,0x51106ebd, + 0xf98a213e,0x3d06dd96,0xae053edd,0x46bde64d, + 0xb58d5491,0x055dc471,0x6fd40604,0xff155060, + 0x24fb9819,0x97e9bdd6,0xcc434089,0x779ed967, + 0xbd42e8b0,0x888b8907,0x385b19e7,0xdbeec879, + 0x470a7ca1,0xe90f427c,0xc91e84f8,0x00000000, + 0x83868009,0x48ed2b32,0xac70111e,0x4e725a6c, + 0xfbff0efd,0x5638850f,0x1ed5ae3d,0x27392d36, + 0x64d90f0a,0x21a65c68,0xd1545b9b,0x3a2e3624, + 0xb1670a0c,0x0fe75793,0xd296eeb4,0x9e919b1b, + 0x4fc5c080,0xa220dc61,0x694b775a,0x161a121c, + 0x0aba93e2,0xe52aa0c0,0x43e0223c,0x1d171b12, + 0x0b0d090e,0xadc78bf2,0xb9a8b62d,0xc8a91e14, + 0x8519f157,0x4c0775af,0xbbdd99ee,0xfd607fa3, + 0x9f2601f7,0xbcf5725c,0xc53b6644,0x347efb5b, + 0x7629438b,0xdcc623cb,0x68fcedb6,0x63f1e4b8, + 0xcadc31d7,0x10856342,0x40229713,0x2011c684, + 0x7d244a85,0xf83dbbd2,0x1132f9ae,0x6da129c7, + 0x4b2f9e1d,0xf330b2dc,0xec52860d,0xd0e3c177, + 0x6c16b32b,0x99b970a9,0xfa489411,0x2264e947, + 0xc48cfca8,0x1a3ff0a0,0xd82c7d56,0xef903322, + 0xc74e4987,0xc1d138d9,0xfea2ca8c,0x360bd498, + 0xcf81f5a6,0x28de7aa5,0x268eb7da,0xa4bfad3f, + 0xe49d3a2c,0x0d927850,0x9bcc5f6a,0x62467e54, + 0xc2138df6,0xe8b8d890,0x5ef7392e,0xf5afc382, + 0xbe805d9f,0x7c93d069,0xa92dd56f,0xb31225cf, + 0x3b99acc8,0xa77d1810,0x6e639ce8,0x7bbb3bdb, + 0x097826cd,0xf418596e,0x01b79aec,0xa89a4f83, + 0x656e95e6,0x7ee6ffaa,0x08cfbc21,0xe6e815ef, + 0xd99be7ba,0xce366f4a,0xd4099fea,0xd67cb029, + 0xafb2a431,0x31233f2a,0x3094a5c6,0xc066a235, + 0x37bc4e74,0xa6ca82fc,0xb0d090e0,0x15d8a733, + 0x4a9804f1,0xf7daec41,0x0e50cd7f,0x2ff69117, + 0x8dd64d76,0x4db0ef43,0x544daacc,0xdf0496e4, + 0xe3b5d19e,0x1b886a4c,0xb81f2cc1,0x7f516546, + 0x04ea5e9d,0x5d358c01,0x737487fa,0x2e410bfb, + 0x5a1d67b3,0x52d2db92,0x335610e9,0x1347d66d, + 0x8c61d79a,0x7a0ca137,0x8e14f859,0x893c13eb, + 0xee27a9ce,0x35c961b7,0xede51ce1,0x3cb1477a, + 0x59dfd29c,0x3f73f255,0x79ce1418,0xbf37c773, + 0xeacdf753,0x5baafd5f,0x146f3ddf,0x86db4478, + 0x81f3afca,0x3ec468b9,0x2c342438,0x5f40a3c2, + 0x72c31d16,0x0c25e2bc,0x8b493c28,0x41950dff, + 0x7101a839,0xdeb30c08,0x9ce4b4d8,0x90c15664, + 0x6184cb7b,0x70b632d5,0x745c6c48,0x4257b8d0, + }, +#if !AES_SMALL + { /* Before: itable[1] */ + 0xa7f45150,0x65417e53,0xa4171ac3,0x5e273a96, + 0x6bab3bcb,0x459d1ff1,0x58faacab,0x03e34b93, + 0xfa302055,0x6d76adf6,0x76cc8891,0x4c02f525, + 0xd7e54ffc,0xcb2ac5d7,0x44352680,0xa362b58f, + 0x5ab1de49,0x1bba2567,0x0eea4598,0xc0fe5de1, + 0x752fc302,0xf04c8112,0x97468da3,0xf9d36bc6, + 0x5f8f03e7,0x9c921595,0x7a6dbfeb,0x595295da, + 0x83bed42d,0x217458d3,0x69e04929,0xc8c98e44, + 0x89c2756a,0x798ef478,0x3e58996b,0x71b927dd, + 0x4fe1beb6,0xad88f017,0xac20c966,0x3ace7db4, + 0x4adf6318,0x311ae582,0x33519760,0x7f536245, + 0x7764b1e0,0xae6bbb84,0xa081fe1c,0x2b08f994, + 0x68487058,0xfd458f19,0x6cde9487,0xf87b52b7, + 0xd373ab23,0x024b72e2,0x8f1fe357,0xab55662a, + 0x28ebb207,0xc2b52f03,0x7bc5869a,0x0837d3a5, + 0x872830f2,0xa5bf23b2,0x6a0302ba,0x8216ed5c, + 0x1ccf8a2b,0xb479a792,0xf207f3f0,0xe2694ea1, + 0xf4da65cd,0xbe0506d5,0x6234d11f,0xfea6c48a, + 0x532e349d,0x55f3a2a0,0xe18a0532,0xebf6a475, + 0xec830b39,0xef6040aa,0x9f715e06,0x106ebd51, + 0x8a213ef9,0x06dd963d,0x053eddae,0xbde64d46, + 0x8d5491b5,0x5dc47105,0xd406046f,0x155060ff, + 0xfb981924,0xe9bdd697,0x434089cc,0x9ed96777, + 0x42e8b0bd,0x8b890788,0x5b19e738,0xeec879db, + 0x0a7ca147,0x0f427ce9,0x1e84f8c9,0x00000000, + 0x86800983,0xed2b3248,0x70111eac,0x725a6c4e, + 0xff0efdfb,0x38850f56,0xd5ae3d1e,0x392d3627, + 0xd90f0a64,0xa65c6821,0x545b9bd1,0x2e36243a, + 0x670a0cb1,0xe757930f,0x96eeb4d2,0x919b1b9e, + 0xc5c0804f,0x20dc61a2,0x4b775a69,0x1a121c16, + 0xba93e20a,0x2aa0c0e5,0xe0223c43,0x171b121d, + 0x0d090e0b,0xc78bf2ad,0xa8b62db9,0xa91e14c8, + 0x19f15785,0x0775af4c,0xdd99eebb,0x607fa3fd, + 0x2601f79f,0xf5725cbc,0x3b6644c5,0x7efb5b34, + 0x29438b76,0xc623cbdc,0xfcedb668,0xf1e4b863, + 0xdc31d7ca,0x85634210,0x22971340,0x11c68420, + 0x244a857d,0x3dbbd2f8,0x32f9ae11,0xa129c76d, + 0x2f9e1d4b,0x30b2dcf3,0x52860dec,0xe3c177d0, + 0x16b32b6c,0xb970a999,0x489411fa,0x64e94722, + 0x8cfca8c4,0x3ff0a01a,0x2c7d56d8,0x903322ef, + 0x4e4987c7,0xd138d9c1,0xa2ca8cfe,0x0bd49836, + 0x81f5a6cf,0xde7aa528,0x8eb7da26,0xbfad3fa4, + 0x9d3a2ce4,0x9278500d,0xcc5f6a9b,0x467e5462, + 0x138df6c2,0xb8d890e8,0xf7392e5e,0xafc382f5, + 0x805d9fbe,0x93d0697c,0x2dd56fa9,0x1225cfb3, + 0x99acc83b,0x7d1810a7,0x639ce86e,0xbb3bdb7b, + 0x7826cd09,0x18596ef4,0xb79aec01,0x9a4f83a8, + 0x6e95e665,0xe6ffaa7e,0xcfbc2108,0xe815efe6, + 0x9be7bad9,0x366f4ace,0x099fead4,0x7cb029d6, + 0xb2a431af,0x233f2a31,0x94a5c630,0x66a235c0, + 0xbc4e7437,0xca82fca6,0xd090e0b0,0xd8a73315, + 0x9804f14a,0xdaec41f7,0x50cd7f0e,0xf691172f, + 0xd64d768d,0xb0ef434d,0x4daacc54,0x0496e4df, + 0xb5d19ee3,0x886a4c1b,0x1f2cc1b8,0x5165467f, + 0xea5e9d04,0x358c015d,0x7487fa73,0x410bfb2e, + 0x1d67b35a,0xd2db9252,0x5610e933,0x47d66d13, + 0x61d79a8c,0x0ca1377a,0x14f8598e,0x3c13eb89, + 0x27a9ceee,0xc961b735,0xe51ce1ed,0xb1477a3c, + 0xdfd29c59,0x73f2553f,0xce141879,0x37c773bf, + 0xcdf753ea,0xaafd5f5b,0x6f3ddf14,0xdb447886, + 0xf3afca81,0xc468b93e,0x3424382c,0x40a3c25f, + 0xc31d1672,0x25e2bc0c,0x493c288b,0x950dff41, + 0x01a83971,0xb30c08de,0xe4b4d89c,0xc1566490, + 0x84cb7b61,0xb632d570,0x5c6c4874,0x57b8d042, + },{ /* Before: itable[2] */ + 0xf45150a7,0x417e5365,0x171ac3a4,0x273a965e, + 0xab3bcb6b,0x9d1ff145,0xfaacab58,0xe34b9303, + 0x302055fa,0x76adf66d,0xcc889176,0x02f5254c, + 0xe54ffcd7,0x2ac5d7cb,0x35268044,0x62b58fa3, + 0xb1de495a,0xba25671b,0xea45980e,0xfe5de1c0, + 0x2fc30275,0x4c8112f0,0x468da397,0xd36bc6f9, + 0x8f03e75f,0x9215959c,0x6dbfeb7a,0x5295da59, + 0xbed42d83,0x7458d321,0xe0492969,0xc98e44c8, + 0xc2756a89,0x8ef47879,0x58996b3e,0xb927dd71, + 0xe1beb64f,0x88f017ad,0x20c966ac,0xce7db43a, + 0xdf63184a,0x1ae58231,0x51976033,0x5362457f, + 0x64b1e077,0x6bbb84ae,0x81fe1ca0,0x08f9942b, + 0x48705868,0x458f19fd,0xde94876c,0x7b52b7f8, + 0x73ab23d3,0x4b72e202,0x1fe3578f,0x55662aab, + 0xebb20728,0xb52f03c2,0xc5869a7b,0x37d3a508, + 0x2830f287,0xbf23b2a5,0x0302ba6a,0x16ed5c82, + 0xcf8a2b1c,0x79a792b4,0x07f3f0f2,0x694ea1e2, + 0xda65cdf4,0x0506d5be,0x34d11f62,0xa6c48afe, + 0x2e349d53,0xf3a2a055,0x8a0532e1,0xf6a475eb, + 0x830b39ec,0x6040aaef,0x715e069f,0x6ebd5110, + 0x213ef98a,0xdd963d06,0x3eddae05,0xe64d46bd, + 0x5491b58d,0xc471055d,0x06046fd4,0x5060ff15, + 0x981924fb,0xbdd697e9,0x4089cc43,0xd967779e, + 0xe8b0bd42,0x8907888b,0x19e7385b,0xc879dbee, + 0x7ca1470a,0x427ce90f,0x84f8c91e,0x00000000, + 0x80098386,0x2b3248ed,0x111eac70,0x5a6c4e72, + 0x0efdfbff,0x850f5638,0xae3d1ed5,0x2d362739, + 0x0f0a64d9,0x5c6821a6,0x5b9bd154,0x36243a2e, + 0x0a0cb167,0x57930fe7,0xeeb4d296,0x9b1b9e91, + 0xc0804fc5,0xdc61a220,0x775a694b,0x121c161a, + 0x93e20aba,0xa0c0e52a,0x223c43e0,0x1b121d17, + 0x090e0b0d,0x8bf2adc7,0xb62db9a8,0x1e14c8a9, + 0xf1578519,0x75af4c07,0x99eebbdd,0x7fa3fd60, + 0x01f79f26,0x725cbcf5,0x6644c53b,0xfb5b347e, + 0x438b7629,0x23cbdcc6,0xedb668fc,0xe4b863f1, + 0x31d7cadc,0x63421085,0x97134022,0xc6842011, + 0x4a857d24,0xbbd2f83d,0xf9ae1132,0x29c76da1, + 0x9e1d4b2f,0xb2dcf330,0x860dec52,0xc177d0e3, + 0xb32b6c16,0x70a999b9,0x9411fa48,0xe9472264, + 0xfca8c48c,0xf0a01a3f,0x7d56d82c,0x3322ef90, + 0x4987c74e,0x38d9c1d1,0xca8cfea2,0xd498360b, + 0xf5a6cf81,0x7aa528de,0xb7da268e,0xad3fa4bf, + 0x3a2ce49d,0x78500d92,0x5f6a9bcc,0x7e546246, + 0x8df6c213,0xd890e8b8,0x392e5ef7,0xc382f5af, + 0x5d9fbe80,0xd0697c93,0xd56fa92d,0x25cfb312, + 0xacc83b99,0x1810a77d,0x9ce86e63,0x3bdb7bbb, + 0x26cd0978,0x596ef418,0x9aec01b7,0x4f83a89a, + 0x95e6656e,0xffaa7ee6,0xbc2108cf,0x15efe6e8, + 0xe7bad99b,0x6f4ace36,0x9fead409,0xb029d67c, + 0xa431afb2,0x3f2a3123,0xa5c63094,0xa235c066, + 0x4e7437bc,0x82fca6ca,0x90e0b0d0,0xa73315d8, + 0x04f14a98,0xec41f7da,0xcd7f0e50,0x91172ff6, + 0x4d768dd6,0xef434db0,0xaacc544d,0x96e4df04, + 0xd19ee3b5,0x6a4c1b88,0x2cc1b81f,0x65467f51, + 0x5e9d04ea,0x8c015d35,0x87fa7374,0x0bfb2e41, + 0x67b35a1d,0xdb9252d2,0x10e93356,0xd66d1347, + 0xd79a8c61,0xa1377a0c,0xf8598e14,0x13eb893c, + 0xa9ceee27,0x61b735c9,0x1ce1ede5,0x477a3cb1, + 0xd29c59df,0xf2553f73,0x141879ce,0xc773bf37, + 0xf753eacd,0xfd5f5baa,0x3ddf146f,0x447886db, + 0xafca81f3,0x68b93ec4,0x24382c34,0xa3c25f40, + 0x1d1672c3,0xe2bc0c25,0x3c288b49,0x0dff4195, + 0xa8397101,0x0c08deb3,0xb4d89ce4,0x566490c1, + 0xcb7b6184,0x32d570b6,0x6c48745c,0xb8d04257, + },{ /* Before: itable[3] */ + 0x5150a7f4,0x7e536541,0x1ac3a417,0x3a965e27, + 0x3bcb6bab,0x1ff1459d,0xacab58fa,0x4b9303e3, + 0x2055fa30,0xadf66d76,0x889176cc,0xf5254c02, + 0x4ffcd7e5,0xc5d7cb2a,0x26804435,0xb58fa362, + 0xde495ab1,0x25671bba,0x45980eea,0x5de1c0fe, + 0xc302752f,0x8112f04c,0x8da39746,0x6bc6f9d3, + 0x03e75f8f,0x15959c92,0xbfeb7a6d,0x95da5952, + 0xd42d83be,0x58d32174,0x492969e0,0x8e44c8c9, + 0x756a89c2,0xf478798e,0x996b3e58,0x27dd71b9, + 0xbeb64fe1,0xf017ad88,0xc966ac20,0x7db43ace, + 0x63184adf,0xe582311a,0x97603351,0x62457f53, + 0xb1e07764,0xbb84ae6b,0xfe1ca081,0xf9942b08, + 0x70586848,0x8f19fd45,0x94876cde,0x52b7f87b, + 0xab23d373,0x72e2024b,0xe3578f1f,0x662aab55, + 0xb20728eb,0x2f03c2b5,0x869a7bc5,0xd3a50837, + 0x30f28728,0x23b2a5bf,0x02ba6a03,0xed5c8216, + 0x8a2b1ccf,0xa792b479,0xf3f0f207,0x4ea1e269, + 0x65cdf4da,0x06d5be05,0xd11f6234,0xc48afea6, + 0x349d532e,0xa2a055f3,0x0532e18a,0xa475ebf6, + 0x0b39ec83,0x40aaef60,0x5e069f71,0xbd51106e, + 0x3ef98a21,0x963d06dd,0xddae053e,0x4d46bde6, + 0x91b58d54,0x71055dc4,0x046fd406,0x60ff1550, + 0x1924fb98,0xd697e9bd,0x89cc4340,0x67779ed9, + 0xb0bd42e8,0x07888b89,0xe7385b19,0x79dbeec8, + 0xa1470a7c,0x7ce90f42,0xf8c91e84,0x00000000, + 0x09838680,0x3248ed2b,0x1eac7011,0x6c4e725a, + 0xfdfbff0e,0x0f563885,0x3d1ed5ae,0x3627392d, + 0x0a64d90f,0x6821a65c,0x9bd1545b,0x243a2e36, + 0x0cb1670a,0x930fe757,0xb4d296ee,0x1b9e919b, + 0x804fc5c0,0x61a220dc,0x5a694b77,0x1c161a12, + 0xe20aba93,0xc0e52aa0,0x3c43e022,0x121d171b, + 0x0e0b0d09,0xf2adc78b,0x2db9a8b6,0x14c8a91e, + 0x578519f1,0xaf4c0775,0xeebbdd99,0xa3fd607f, + 0xf79f2601,0x5cbcf572,0x44c53b66,0x5b347efb, + 0x8b762943,0xcbdcc623,0xb668fced,0xb863f1e4, + 0xd7cadc31,0x42108563,0x13402297,0x842011c6, + 0x857d244a,0xd2f83dbb,0xae1132f9,0xc76da129, + 0x1d4b2f9e,0xdcf330b2,0x0dec5286,0x77d0e3c1, + 0x2b6c16b3,0xa999b970,0x11fa4894,0x472264e9, + 0xa8c48cfc,0xa01a3ff0,0x56d82c7d,0x22ef9033, + 0x87c74e49,0xd9c1d138,0x8cfea2ca,0x98360bd4, + 0xa6cf81f5,0xa528de7a,0xda268eb7,0x3fa4bfad, + 0x2ce49d3a,0x500d9278,0x6a9bcc5f,0x5462467e, + 0xf6c2138d,0x90e8b8d8,0x2e5ef739,0x82f5afc3, + 0x9fbe805d,0x697c93d0,0x6fa92dd5,0xcfb31225, + 0xc83b99ac,0x10a77d18,0xe86e639c,0xdb7bbb3b, + 0xcd097826,0x6ef41859,0xec01b79a,0x83a89a4f, + 0xe6656e95,0xaa7ee6ff,0x2108cfbc,0xefe6e815, + 0xbad99be7,0x4ace366f,0xead4099f,0x29d67cb0, + 0x31afb2a4,0x2a31233f,0xc63094a5,0x35c066a2, + 0x7437bc4e,0xfca6ca82,0xe0b0d090,0x3315d8a7, + 0xf14a9804,0x41f7daec,0x7f0e50cd,0x172ff691, + 0x768dd64d,0x434db0ef,0xcc544daa,0xe4df0496, + 0x9ee3b5d1,0x4c1b886a,0xc1b81f2c,0x467f5165, + 0x9d04ea5e,0x015d358c,0xfa737487,0xfb2e410b, + 0xb35a1d67,0x9252d2db,0xe9335610,0x6d1347d6, + 0x9a8c61d7,0x377a0ca1,0x598e14f8,0xeb893c13, + 0xceee27a9,0xb735c961,0xe1ede51c,0x7a3cb147, + 0x9c59dfd2,0x553f73f2,0x1879ce14,0x73bf37c7, + 0x53eacdf7,0x5f5baafd,0xdf146f3d,0x7886db44, + 0xca81f3af,0xb93ec468,0x382c3424,0xc25f40a3, + 0x1672c31d,0xbc0c25e2,0x288b493c,0xff41950d, + 0x397101a8,0x08deb30c,0xd89ce4b4,0x6490c156, + 0x7b6184cb,0xd570b632,0x48745c6c,0xd04257b8, + }, +#endif /* !AES_SMALL */ + } + }; diff --git a/aes-decrypt.c b/aes-decrypt.c index 1c22bfbb..6b007121 100644 --- a/aes-decrypt.c +++ b/aes-decrypt.c @@ -35,316 +35,10 @@ # include "config.h" #endif
-#include <assert.h> #include <stdlib.h>
#include "aes-internal.h"
-static const struct aes_table -_aes_decrypt_table = - { /* isbox */ - { - 0x52,0x09,0x6a,0xd5,0x30,0x36,0xa5,0x38, - 0xbf,0x40,0xa3,0x9e,0x81,0xf3,0xd7,0xfb, - 0x7c,0xe3,0x39,0x82,0x9b,0x2f,0xff,0x87, - 0x34,0x8e,0x43,0x44,0xc4,0xde,0xe9,0xcb, - 0x54,0x7b,0x94,0x32,0xa6,0xc2,0x23,0x3d, - 0xee,0x4c,0x95,0x0b,0x42,0xfa,0xc3,0x4e, - 0x08,0x2e,0xa1,0x66,0x28,0xd9,0x24,0xb2, - 0x76,0x5b,0xa2,0x49,0x6d,0x8b,0xd1,0x25, - 0x72,0xf8,0xf6,0x64,0x86,0x68,0x98,0x16, - 0xd4,0xa4,0x5c,0xcc,0x5d,0x65,0xb6,0x92, - 0x6c,0x70,0x48,0x50,0xfd,0xed,0xb9,0xda, - 0x5e,0x15,0x46,0x57,0xa7,0x8d,0x9d,0x84, - 0x90,0xd8,0xab,0x00,0x8c,0xbc,0xd3,0x0a, - 0xf7,0xe4,0x58,0x05,0xb8,0xb3,0x45,0x06, - 0xd0,0x2c,0x1e,0x8f,0xca,0x3f,0x0f,0x02, - 0xc1,0xaf,0xbd,0x03,0x01,0x13,0x8a,0x6b, - 0x3a,0x91,0x11,0x41,0x4f,0x67,0xdc,0xea, - 0x97,0xf2,0xcf,0xce,0xf0,0xb4,0xe6,0x73, - 0x96,0xac,0x74,0x22,0xe7,0xad,0x35,0x85, - 0xe2,0xf9,0x37,0xe8,0x1c,0x75,0xdf,0x6e, - 0x47,0xf1,0x1a,0x71,0x1d,0x29,0xc5,0x89, - 0x6f,0xb7,0x62,0x0e,0xaa,0x18,0xbe,0x1b, - 0xfc,0x56,0x3e,0x4b,0xc6,0xd2,0x79,0x20, - 0x9a,0xdb,0xc0,0xfe,0x78,0xcd,0x5a,0xf4, - 0x1f,0xdd,0xa8,0x33,0x88,0x07,0xc7,0x31, - 0xb1,0x12,0x10,0x59,0x27,0x80,0xec,0x5f, - 0x60,0x51,0x7f,0xa9,0x19,0xb5,0x4a,0x0d, - 0x2d,0xe5,0x7a,0x9f,0x93,0xc9,0x9c,0xef, - 0xa0,0xe0,0x3b,0x4d,0xae,0x2a,0xf5,0xb0, - 0xc8,0xeb,0xbb,0x3c,0x83,0x53,0x99,0x61, - 0x17,0x2b,0x04,0x7e,0xba,0x77,0xd6,0x26, - 0xe1,0x69,0x14,0x63,0x55,0x21,0x0c,0x7d, - }, - { /* itable */ - { - 0x50a7f451,0x5365417e,0xc3a4171a,0x965e273a, - 0xcb6bab3b,0xf1459d1f,0xab58faac,0x9303e34b, - 0x55fa3020,0xf66d76ad,0x9176cc88,0x254c02f5, - 0xfcd7e54f,0xd7cb2ac5,0x80443526,0x8fa362b5, - 0x495ab1de,0x671bba25,0x980eea45,0xe1c0fe5d, - 0x02752fc3,0x12f04c81,0xa397468d,0xc6f9d36b, - 0xe75f8f03,0x959c9215,0xeb7a6dbf,0xda595295, - 0x2d83bed4,0xd3217458,0x2969e049,0x44c8c98e, - 0x6a89c275,0x78798ef4,0x6b3e5899,0xdd71b927, - 0xb64fe1be,0x17ad88f0,0x66ac20c9,0xb43ace7d, - 0x184adf63,0x82311ae5,0x60335197,0x457f5362, - 0xe07764b1,0x84ae6bbb,0x1ca081fe,0x942b08f9, - 0x58684870,0x19fd458f,0x876cde94,0xb7f87b52, - 0x23d373ab,0xe2024b72,0x578f1fe3,0x2aab5566, - 0x0728ebb2,0x03c2b52f,0x9a7bc586,0xa50837d3, - 0xf2872830,0xb2a5bf23,0xba6a0302,0x5c8216ed, - 0x2b1ccf8a,0x92b479a7,0xf0f207f3,0xa1e2694e, - 0xcdf4da65,0xd5be0506,0x1f6234d1,0x8afea6c4, - 0x9d532e34,0xa055f3a2,0x32e18a05,0x75ebf6a4, - 0x39ec830b,0xaaef6040,0x069f715e,0x51106ebd, - 0xf98a213e,0x3d06dd96,0xae053edd,0x46bde64d, - 0xb58d5491,0x055dc471,0x6fd40604,0xff155060, - 0x24fb9819,0x97e9bdd6,0xcc434089,0x779ed967, - 0xbd42e8b0,0x888b8907,0x385b19e7,0xdbeec879, - 0x470a7ca1,0xe90f427c,0xc91e84f8,0x00000000, - 0x83868009,0x48ed2b32,0xac70111e,0x4e725a6c, - 0xfbff0efd,0x5638850f,0x1ed5ae3d,0x27392d36, - 0x64d90f0a,0x21a65c68,0xd1545b9b,0x3a2e3624, - 0xb1670a0c,0x0fe75793,0xd296eeb4,0x9e919b1b, - 0x4fc5c080,0xa220dc61,0x694b775a,0x161a121c, - 0x0aba93e2,0xe52aa0c0,0x43e0223c,0x1d171b12, - 0x0b0d090e,0xadc78bf2,0xb9a8b62d,0xc8a91e14, - 0x8519f157,0x4c0775af,0xbbdd99ee,0xfd607fa3, - 0x9f2601f7,0xbcf5725c,0xc53b6644,0x347efb5b, - 0x7629438b,0xdcc623cb,0x68fcedb6,0x63f1e4b8, - 0xcadc31d7,0x10856342,0x40229713,0x2011c684, - 0x7d244a85,0xf83dbbd2,0x1132f9ae,0x6da129c7, - 0x4b2f9e1d,0xf330b2dc,0xec52860d,0xd0e3c177, - 0x6c16b32b,0x99b970a9,0xfa489411,0x2264e947, - 0xc48cfca8,0x1a3ff0a0,0xd82c7d56,0xef903322, - 0xc74e4987,0xc1d138d9,0xfea2ca8c,0x360bd498, - 0xcf81f5a6,0x28de7aa5,0x268eb7da,0xa4bfad3f, - 0xe49d3a2c,0x0d927850,0x9bcc5f6a,0x62467e54, - 0xc2138df6,0xe8b8d890,0x5ef7392e,0xf5afc382, - 0xbe805d9f,0x7c93d069,0xa92dd56f,0xb31225cf, - 0x3b99acc8,0xa77d1810,0x6e639ce8,0x7bbb3bdb, - 0x097826cd,0xf418596e,0x01b79aec,0xa89a4f83, - 0x656e95e6,0x7ee6ffaa,0x08cfbc21,0xe6e815ef, - 0xd99be7ba,0xce366f4a,0xd4099fea,0xd67cb029, - 0xafb2a431,0x31233f2a,0x3094a5c6,0xc066a235, - 0x37bc4e74,0xa6ca82fc,0xb0d090e0,0x15d8a733, - 0x4a9804f1,0xf7daec41,0x0e50cd7f,0x2ff69117, - 0x8dd64d76,0x4db0ef43,0x544daacc,0xdf0496e4, - 0xe3b5d19e,0x1b886a4c,0xb81f2cc1,0x7f516546, - 0x04ea5e9d,0x5d358c01,0x737487fa,0x2e410bfb, - 0x5a1d67b3,0x52d2db92,0x335610e9,0x1347d66d, - 0x8c61d79a,0x7a0ca137,0x8e14f859,0x893c13eb, - 0xee27a9ce,0x35c961b7,0xede51ce1,0x3cb1477a, - 0x59dfd29c,0x3f73f255,0x79ce1418,0xbf37c773, - 0xeacdf753,0x5baafd5f,0x146f3ddf,0x86db4478, - 0x81f3afca,0x3ec468b9,0x2c342438,0x5f40a3c2, - 0x72c31d16,0x0c25e2bc,0x8b493c28,0x41950dff, - 0x7101a839,0xdeb30c08,0x9ce4b4d8,0x90c15664, - 0x6184cb7b,0x70b632d5,0x745c6c48,0x4257b8d0, - }, -#if !AES_SMALL - { /* Before: itable[1] */ - 0xa7f45150,0x65417e53,0xa4171ac3,0x5e273a96, - 0x6bab3bcb,0x459d1ff1,0x58faacab,0x03e34b93, - 0xfa302055,0x6d76adf6,0x76cc8891,0x4c02f525, - 0xd7e54ffc,0xcb2ac5d7,0x44352680,0xa362b58f, - 0x5ab1de49,0x1bba2567,0x0eea4598,0xc0fe5de1, - 0x752fc302,0xf04c8112,0x97468da3,0xf9d36bc6, - 0x5f8f03e7,0x9c921595,0x7a6dbfeb,0x595295da, - 0x83bed42d,0x217458d3,0x69e04929,0xc8c98e44, - 0x89c2756a,0x798ef478,0x3e58996b,0x71b927dd, - 0x4fe1beb6,0xad88f017,0xac20c966,0x3ace7db4, - 0x4adf6318,0x311ae582,0x33519760,0x7f536245, - 0x7764b1e0,0xae6bbb84,0xa081fe1c,0x2b08f994, - 0x68487058,0xfd458f19,0x6cde9487,0xf87b52b7, - 0xd373ab23,0x024b72e2,0x8f1fe357,0xab55662a, - 0x28ebb207,0xc2b52f03,0x7bc5869a,0x0837d3a5, - 0x872830f2,0xa5bf23b2,0x6a0302ba,0x8216ed5c, - 0x1ccf8a2b,0xb479a792,0xf207f3f0,0xe2694ea1, - 0xf4da65cd,0xbe0506d5,0x6234d11f,0xfea6c48a, - 0x532e349d,0x55f3a2a0,0xe18a0532,0xebf6a475, - 0xec830b39,0xef6040aa,0x9f715e06,0x106ebd51, - 0x8a213ef9,0x06dd963d,0x053eddae,0xbde64d46, - 0x8d5491b5,0x5dc47105,0xd406046f,0x155060ff, - 0xfb981924,0xe9bdd697,0x434089cc,0x9ed96777, - 0x42e8b0bd,0x8b890788,0x5b19e738,0xeec879db, - 0x0a7ca147,0x0f427ce9,0x1e84f8c9,0x00000000, - 0x86800983,0xed2b3248,0x70111eac,0x725a6c4e, - 0xff0efdfb,0x38850f56,0xd5ae3d1e,0x392d3627, - 0xd90f0a64,0xa65c6821,0x545b9bd1,0x2e36243a, - 0x670a0cb1,0xe757930f,0x96eeb4d2,0x919b1b9e, - 0xc5c0804f,0x20dc61a2,0x4b775a69,0x1a121c16, - 0xba93e20a,0x2aa0c0e5,0xe0223c43,0x171b121d, - 0x0d090e0b,0xc78bf2ad,0xa8b62db9,0xa91e14c8, - 0x19f15785,0x0775af4c,0xdd99eebb,0x607fa3fd, - 0x2601f79f,0xf5725cbc,0x3b6644c5,0x7efb5b34, - 0x29438b76,0xc623cbdc,0xfcedb668,0xf1e4b863, - 0xdc31d7ca,0x85634210,0x22971340,0x11c68420, - 0x244a857d,0x3dbbd2f8,0x32f9ae11,0xa129c76d, - 0x2f9e1d4b,0x30b2dcf3,0x52860dec,0xe3c177d0, - 0x16b32b6c,0xb970a999,0x489411fa,0x64e94722, - 0x8cfca8c4,0x3ff0a01a,0x2c7d56d8,0x903322ef, - 0x4e4987c7,0xd138d9c1,0xa2ca8cfe,0x0bd49836, - 0x81f5a6cf,0xde7aa528,0x8eb7da26,0xbfad3fa4, - 0x9d3a2ce4,0x9278500d,0xcc5f6a9b,0x467e5462, - 0x138df6c2,0xb8d890e8,0xf7392e5e,0xafc382f5, - 0x805d9fbe,0x93d0697c,0x2dd56fa9,0x1225cfb3, - 0x99acc83b,0x7d1810a7,0x639ce86e,0xbb3bdb7b, - 0x7826cd09,0x18596ef4,0xb79aec01,0x9a4f83a8, - 0x6e95e665,0xe6ffaa7e,0xcfbc2108,0xe815efe6, - 0x9be7bad9,0x366f4ace,0x099fead4,0x7cb029d6, - 0xb2a431af,0x233f2a31,0x94a5c630,0x66a235c0, - 0xbc4e7437,0xca82fca6,0xd090e0b0,0xd8a73315, - 0x9804f14a,0xdaec41f7,0x50cd7f0e,0xf691172f, - 0xd64d768d,0xb0ef434d,0x4daacc54,0x0496e4df, - 0xb5d19ee3,0x886a4c1b,0x1f2cc1b8,0x5165467f, - 0xea5e9d04,0x358c015d,0x7487fa73,0x410bfb2e, - 0x1d67b35a,0xd2db9252,0x5610e933,0x47d66d13, - 0x61d79a8c,0x0ca1377a,0x14f8598e,0x3c13eb89, - 0x27a9ceee,0xc961b735,0xe51ce1ed,0xb1477a3c, - 0xdfd29c59,0x73f2553f,0xce141879,0x37c773bf, - 0xcdf753ea,0xaafd5f5b,0x6f3ddf14,0xdb447886, - 0xf3afca81,0xc468b93e,0x3424382c,0x40a3c25f, - 0xc31d1672,0x25e2bc0c,0x493c288b,0x950dff41, - 0x01a83971,0xb30c08de,0xe4b4d89c,0xc1566490, - 0x84cb7b61,0xb632d570,0x5c6c4874,0x57b8d042, - },{ /* Before: itable[2] */ - 0xf45150a7,0x417e5365,0x171ac3a4,0x273a965e, - 0xab3bcb6b,0x9d1ff145,0xfaacab58,0xe34b9303, - 0x302055fa,0x76adf66d,0xcc889176,0x02f5254c, - 0xe54ffcd7,0x2ac5d7cb,0x35268044,0x62b58fa3, - 0xb1de495a,0xba25671b,0xea45980e,0xfe5de1c0, - 0x2fc30275,0x4c8112f0,0x468da397,0xd36bc6f9, - 0x8f03e75f,0x9215959c,0x6dbfeb7a,0x5295da59, - 0xbed42d83,0x7458d321,0xe0492969,0xc98e44c8, - 0xc2756a89,0x8ef47879,0x58996b3e,0xb927dd71, - 0xe1beb64f,0x88f017ad,0x20c966ac,0xce7db43a, - 0xdf63184a,0x1ae58231,0x51976033,0x5362457f, - 0x64b1e077,0x6bbb84ae,0x81fe1ca0,0x08f9942b, - 0x48705868,0x458f19fd,0xde94876c,0x7b52b7f8, - 0x73ab23d3,0x4b72e202,0x1fe3578f,0x55662aab, - 0xebb20728,0xb52f03c2,0xc5869a7b,0x37d3a508, - 0x2830f287,0xbf23b2a5,0x0302ba6a,0x16ed5c82, - 0xcf8a2b1c,0x79a792b4,0x07f3f0f2,0x694ea1e2, - 0xda65cdf4,0x0506d5be,0x34d11f62,0xa6c48afe, - 0x2e349d53,0xf3a2a055,0x8a0532e1,0xf6a475eb, - 0x830b39ec,0x6040aaef,0x715e069f,0x6ebd5110, - 0x213ef98a,0xdd963d06,0x3eddae05,0xe64d46bd, - 0x5491b58d,0xc471055d,0x06046fd4,0x5060ff15, - 0x981924fb,0xbdd697e9,0x4089cc43,0xd967779e, - 0xe8b0bd42,0x8907888b,0x19e7385b,0xc879dbee, - 0x7ca1470a,0x427ce90f,0x84f8c91e,0x00000000, - 0x80098386,0x2b3248ed,0x111eac70,0x5a6c4e72, - 0x0efdfbff,0x850f5638,0xae3d1ed5,0x2d362739, - 0x0f0a64d9,0x5c6821a6,0x5b9bd154,0x36243a2e, - 0x0a0cb167,0x57930fe7,0xeeb4d296,0x9b1b9e91, - 0xc0804fc5,0xdc61a220,0x775a694b,0x121c161a, - 0x93e20aba,0xa0c0e52a,0x223c43e0,0x1b121d17, - 0x090e0b0d,0x8bf2adc7,0xb62db9a8,0x1e14c8a9, - 0xf1578519,0x75af4c07,0x99eebbdd,0x7fa3fd60, - 0x01f79f26,0x725cbcf5,0x6644c53b,0xfb5b347e, - 0x438b7629,0x23cbdcc6,0xedb668fc,0xe4b863f1, - 0x31d7cadc,0x63421085,0x97134022,0xc6842011, - 0x4a857d24,0xbbd2f83d,0xf9ae1132,0x29c76da1, - 0x9e1d4b2f,0xb2dcf330,0x860dec52,0xc177d0e3, - 0xb32b6c16,0x70a999b9,0x9411fa48,0xe9472264, - 0xfca8c48c,0xf0a01a3f,0x7d56d82c,0x3322ef90, - 0x4987c74e,0x38d9c1d1,0xca8cfea2,0xd498360b, - 0xf5a6cf81,0x7aa528de,0xb7da268e,0xad3fa4bf, - 0x3a2ce49d,0x78500d92,0x5f6a9bcc,0x7e546246, - 0x8df6c213,0xd890e8b8,0x392e5ef7,0xc382f5af, - 0x5d9fbe80,0xd0697c93,0xd56fa92d,0x25cfb312, - 0xacc83b99,0x1810a77d,0x9ce86e63,0x3bdb7bbb, - 0x26cd0978,0x596ef418,0x9aec01b7,0x4f83a89a, - 0x95e6656e,0xffaa7ee6,0xbc2108cf,0x15efe6e8, - 0xe7bad99b,0x6f4ace36,0x9fead409,0xb029d67c, - 0xa431afb2,0x3f2a3123,0xa5c63094,0xa235c066, - 0x4e7437bc,0x82fca6ca,0x90e0b0d0,0xa73315d8, - 0x04f14a98,0xec41f7da,0xcd7f0e50,0x91172ff6, - 0x4d768dd6,0xef434db0,0xaacc544d,0x96e4df04, - 0xd19ee3b5,0x6a4c1b88,0x2cc1b81f,0x65467f51, - 0x5e9d04ea,0x8c015d35,0x87fa7374,0x0bfb2e41, - 0x67b35a1d,0xdb9252d2,0x10e93356,0xd66d1347, - 0xd79a8c61,0xa1377a0c,0xf8598e14,0x13eb893c, - 0xa9ceee27,0x61b735c9,0x1ce1ede5,0x477a3cb1, - 0xd29c59df,0xf2553f73,0x141879ce,0xc773bf37, - 0xf753eacd,0xfd5f5baa,0x3ddf146f,0x447886db, - 0xafca81f3,0x68b93ec4,0x24382c34,0xa3c25f40, - 0x1d1672c3,0xe2bc0c25,0x3c288b49,0x0dff4195, - 0xa8397101,0x0c08deb3,0xb4d89ce4,0x566490c1, - 0xcb7b6184,0x32d570b6,0x6c48745c,0xb8d04257, - },{ /* Before: itable[3] */ - 0x5150a7f4,0x7e536541,0x1ac3a417,0x3a965e27, - 0x3bcb6bab,0x1ff1459d,0xacab58fa,0x4b9303e3, - 0x2055fa30,0xadf66d76,0x889176cc,0xf5254c02, - 0x4ffcd7e5,0xc5d7cb2a,0x26804435,0xb58fa362, - 0xde495ab1,0x25671bba,0x45980eea,0x5de1c0fe, - 0xc302752f,0x8112f04c,0x8da39746,0x6bc6f9d3, - 0x03e75f8f,0x15959c92,0xbfeb7a6d,0x95da5952, - 0xd42d83be,0x58d32174,0x492969e0,0x8e44c8c9, - 0x756a89c2,0xf478798e,0x996b3e58,0x27dd71b9, - 0xbeb64fe1,0xf017ad88,0xc966ac20,0x7db43ace, - 0x63184adf,0xe582311a,0x97603351,0x62457f53, - 0xb1e07764,0xbb84ae6b,0xfe1ca081,0xf9942b08, - 0x70586848,0x8f19fd45,0x94876cde,0x52b7f87b, - 0xab23d373,0x72e2024b,0xe3578f1f,0x662aab55, - 0xb20728eb,0x2f03c2b5,0x869a7bc5,0xd3a50837, - 0x30f28728,0x23b2a5bf,0x02ba6a03,0xed5c8216, - 0x8a2b1ccf,0xa792b479,0xf3f0f207,0x4ea1e269, - 0x65cdf4da,0x06d5be05,0xd11f6234,0xc48afea6, - 0x349d532e,0xa2a055f3,0x0532e18a,0xa475ebf6, - 0x0b39ec83,0x40aaef60,0x5e069f71,0xbd51106e, - 0x3ef98a21,0x963d06dd,0xddae053e,0x4d46bde6, - 0x91b58d54,0x71055dc4,0x046fd406,0x60ff1550, - 0x1924fb98,0xd697e9bd,0x89cc4340,0x67779ed9, - 0xb0bd42e8,0x07888b89,0xe7385b19,0x79dbeec8, - 0xa1470a7c,0x7ce90f42,0xf8c91e84,0x00000000, - 0x09838680,0x3248ed2b,0x1eac7011,0x6c4e725a, - 0xfdfbff0e,0x0f563885,0x3d1ed5ae,0x3627392d, - 0x0a64d90f,0x6821a65c,0x9bd1545b,0x243a2e36, - 0x0cb1670a,0x930fe757,0xb4d296ee,0x1b9e919b, - 0x804fc5c0,0x61a220dc,0x5a694b77,0x1c161a12, - 0xe20aba93,0xc0e52aa0,0x3c43e022,0x121d171b, - 0x0e0b0d09,0xf2adc78b,0x2db9a8b6,0x14c8a91e, - 0x578519f1,0xaf4c0775,0xeebbdd99,0xa3fd607f, - 0xf79f2601,0x5cbcf572,0x44c53b66,0x5b347efb, - 0x8b762943,0xcbdcc623,0xb668fced,0xb863f1e4, - 0xd7cadc31,0x42108563,0x13402297,0x842011c6, - 0x857d244a,0xd2f83dbb,0xae1132f9,0xc76da129, - 0x1d4b2f9e,0xdcf330b2,0x0dec5286,0x77d0e3c1, - 0x2b6c16b3,0xa999b970,0x11fa4894,0x472264e9, - 0xa8c48cfc,0xa01a3ff0,0x56d82c7d,0x22ef9033, - 0x87c74e49,0xd9c1d138,0x8cfea2ca,0x98360bd4, - 0xa6cf81f5,0xa528de7a,0xda268eb7,0x3fa4bfad, - 0x2ce49d3a,0x500d9278,0x6a9bcc5f,0x5462467e, - 0xf6c2138d,0x90e8b8d8,0x2e5ef739,0x82f5afc3, - 0x9fbe805d,0x697c93d0,0x6fa92dd5,0xcfb31225, - 0xc83b99ac,0x10a77d18,0xe86e639c,0xdb7bbb3b, - 0xcd097826,0x6ef41859,0xec01b79a,0x83a89a4f, - 0xe6656e95,0xaa7ee6ff,0x2108cfbc,0xefe6e815, - 0xbad99be7,0x4ace366f,0xead4099f,0x29d67cb0, - 0x31afb2a4,0x2a31233f,0xc63094a5,0x35c066a2, - 0x7437bc4e,0xfca6ca82,0xe0b0d090,0x3315d8a7, - 0xf14a9804,0x41f7daec,0x7f0e50cd,0x172ff691, - 0x768dd64d,0x434db0ef,0xcc544daa,0xe4df0496, - 0x9ee3b5d1,0x4c1b886a,0xc1b81f2c,0x467f5165, - 0x9d04ea5e,0x015d358c,0xfa737487,0xfb2e410b, - 0xb35a1d67,0x9252d2db,0xe9335610,0x6d1347d6, - 0x9a8c61d7,0x377a0ca1,0x598e14f8,0xeb893c13, - 0xceee27a9,0xb735c961,0xe1ede51c,0x7a3cb147, - 0x9c59dfd2,0x553f73f2,0x1879ce14,0x73bf37c7, - 0x53eacdf7,0x5f5baafd,0xdf146f3d,0x7886db44, - 0xca81f3af,0xb93ec468,0x382c3424,0xc25f40a3, - 0x1672c31d,0xbc0c25e2,0x288b493c,0xff41950d, - 0x397101a8,0x08deb30c,0xd89ce4b4,0x6490c156, - 0x7b6184cb,0xd570b632,0x48745c6c,0xd04257b8, - }, -#endif /* !AES_SMALL */ - } - }; - void aes_decrypt(const struct aes_ctx *ctx, size_t length, uint8_t *dst, @@ -364,33 +58,3 @@ aes_decrypt(const struct aes_ctx *ctx, break; } } - -void -aes128_decrypt(const struct aes128_ctx *ctx, - size_t length, uint8_t *dst, - const uint8_t *src) -{ - assert(!(length % AES_BLOCK_SIZE) ); - _nettle_aes_decrypt(_AES128_ROUNDS, ctx->keys, &_aes_decrypt_table, - length, dst, src); -} - -void -aes192_decrypt(const struct aes192_ctx *ctx, - size_t length, uint8_t *dst, - const uint8_t *src) -{ - assert(!(length % AES_BLOCK_SIZE) ); - _nettle_aes_decrypt(_AES192_ROUNDS, ctx->keys, &_aes_decrypt_table, - length, dst, src); -} - -void -aes256_decrypt(const struct aes256_ctx *ctx, - size_t length, uint8_t *dst, - const uint8_t *src) -{ - assert(!(length % AES_BLOCK_SIZE) ); - _nettle_aes_decrypt(_AES256_ROUNDS, ctx->keys, &_aes_decrypt_table, - length, dst, src); -} diff --git a/aes-encrypt.c b/aes-encrypt.c index 257fa402..56efd92c 100644 --- a/aes-encrypt.c +++ b/aes-encrypt.c @@ -35,7 +35,6 @@ # include "config.h" #endif
-#include <assert.h> #include <stdlib.h>
#include "aes-internal.h" @@ -62,33 +61,3 @@ aes_encrypt(const struct aes_ctx *ctx, break; } } - -void -aes128_encrypt(const struct aes128_ctx *ctx, - size_t length, uint8_t *dst, - const uint8_t *src) -{ - assert(!(length % AES_BLOCK_SIZE) ); - _nettle_aes_encrypt(_AES128_ROUNDS, ctx->keys, &_nettle_aes_encrypt_table, - length, dst, src); -} - -void -aes192_encrypt(const struct aes192_ctx *ctx, - size_t length, uint8_t *dst, - const uint8_t *src) -{ - assert(!(length % AES_BLOCK_SIZE) ); - _nettle_aes_encrypt(_AES192_ROUNDS, ctx->keys, &_nettle_aes_encrypt_table, - length, dst, src); -} - -void -aes256_encrypt(const struct aes256_ctx *ctx, - size_t length, uint8_t *dst, - const uint8_t *src) -{ - assert(!(length % AES_BLOCK_SIZE) ); - _nettle_aes_encrypt(_AES256_ROUNDS, ctx->keys, &_nettle_aes_encrypt_table, - length, dst, src); -} diff --git a/aes-internal.h b/aes-internal.h index 04f61c8c..64cf7be5 100644 --- a/aes-internal.h +++ b/aes-internal.h @@ -96,9 +96,8 @@ _nettle_aes_decrypt(unsigned rounds, const uint32_t *keys, | ((uint32_t) T->sbox[ B2(w2) ] << 16) \ | ((uint32_t) T->sbox[ B3(w3) ] << 24)) ^ (k))
-/* Globally visible so that the same sbox table can be used by aes_set_encrypt_key */ - extern const struct aes_table _nettle_aes_encrypt_table; #define aes_sbox (_nettle_aes_encrypt_table.sbox) +extern const struct aes_table _nettle_aes_decrypt_table;
#endif /* NETTLE_AES_INTERNAL_H_INCLUDED */ diff --git a/aes128-decrypt.c b/aes128-decrypt.c new file mode 100644 index 00000000..168d8158 --- /dev/null +++ b/aes128-decrypt.c @@ -0,0 +1,50 @@ +/* aes128-decrypt.c + + Decryption function for aes/rijndael block cipher. + + Copyright (C) 2002, 2013 Niels Möller + + This file is part of GNU Nettle. + + GNU Nettle is free software: you can redistribute it and/or + modify it under the terms of either: + + * the GNU Lesser General Public License as published by the Free + Software Foundation; either version 3 of the License, or (at your + option) any later version. + + or + + * the GNU General Public License as published by the Free + Software Foundation; either version 2 of the License, or (at your + option) any later version. + + or both in parallel, as here. + + GNU Nettle is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received copies of the GNU General Public License and + the GNU Lesser General Public License along with this program. If + not, see http://www.gnu.org/licenses/. +*/ + +#if HAVE_CONFIG_H +# include "config.h" +#endif + +#include <assert.h> + +#include "aes-internal.h" + +void +aes128_decrypt(const struct aes128_ctx *ctx, + size_t length, uint8_t *dst, + const uint8_t *src) +{ + assert(!(length % AES_BLOCK_SIZE) ); + _nettle_aes_decrypt(_AES128_ROUNDS, ctx->keys, &_nettle_aes_decrypt_table, + length, dst, src); +} diff --git a/aes128-encrypt.c b/aes128-encrypt.c new file mode 100644 index 00000000..35d15b36 --- /dev/null +++ b/aes128-encrypt.c @@ -0,0 +1,50 @@ +/* aes128-encrypt.c + + Encryption function for the aes/rijndael block cipher. + + Copyright (C) 2002, 2013 Niels Möller + + This file is part of GNU Nettle. + + GNU Nettle is free software: you can redistribute it and/or + modify it under the terms of either: + + * the GNU Lesser General Public License as published by the Free + Software Foundation; either version 3 of the License, or (at your + option) any later version. + + or + + * the GNU General Public License as published by the Free + Software Foundation; either version 2 of the License, or (at your + option) any later version. + + or both in parallel, as here. + + GNU Nettle is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received copies of the GNU General Public License and + the GNU Lesser General Public License along with this program. If + not, see http://www.gnu.org/licenses/. +*/ + +#if HAVE_CONFIG_H +# include "config.h" +#endif + +#include <assert.h> + +#include "aes-internal.h" + +void +aes128_encrypt(const struct aes128_ctx *ctx, + size_t length, uint8_t *dst, + const uint8_t *src) +{ + assert(!(length % AES_BLOCK_SIZE) ); + _nettle_aes_encrypt(_AES128_ROUNDS, ctx->keys, &_nettle_aes_encrypt_table, + length, dst, src); +} diff --git a/aes192-decrypt.c b/aes192-decrypt.c new file mode 100644 index 00000000..f97e2f6b --- /dev/null +++ b/aes192-decrypt.c @@ -0,0 +1,50 @@ +/* aes192-decrypt.c + + Decryption function for aes/rijndael block cipher. + + Copyright (C) 2002, 2013 Niels Möller + + This file is part of GNU Nettle. + + GNU Nettle is free software: you can redistribute it and/or + modify it under the terms of either: + + * the GNU Lesser General Public License as published by the Free + Software Foundation; either version 3 of the License, or (at your + option) any later version. + + or + + * the GNU General Public License as published by the Free + Software Foundation; either version 2 of the License, or (at your + option) any later version. + + or both in parallel, as here. + + GNU Nettle is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received copies of the GNU General Public License and + the GNU Lesser General Public License along with this program. If + not, see http://www.gnu.org/licenses/. +*/ + +#if HAVE_CONFIG_H +# include "config.h" +#endif + +#include <assert.h> + +#include "aes-internal.h" + +void +aes192_decrypt(const struct aes192_ctx *ctx, + size_t length, uint8_t *dst, + const uint8_t *src) +{ + assert(!(length % AES_BLOCK_SIZE) ); + _nettle_aes_decrypt(_AES192_ROUNDS, ctx->keys, &_nettle_aes_decrypt_table, + length, dst, src); +} diff --git a/aes192-encrypt.c b/aes192-encrypt.c new file mode 100644 index 00000000..efa40e45 --- /dev/null +++ b/aes192-encrypt.c @@ -0,0 +1,50 @@ +/* aes192-encrypt.c + + Encryption function for the aes/rijndael block cipher. + + Copyright (C) 2002, 2013 Niels Möller + + This file is part of GNU Nettle. + + GNU Nettle is free software: you can redistribute it and/or + modify it under the terms of either: + + * the GNU Lesser General Public License as published by the Free + Software Foundation; either version 3 of the License, or (at your + option) any later version. + + or + + * the GNU General Public License as published by the Free + Software Foundation; either version 2 of the License, or (at your + option) any later version. + + or both in parallel, as here. + + GNU Nettle is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received copies of the GNU General Public License and + the GNU Lesser General Public License along with this program. If + not, see http://www.gnu.org/licenses/. +*/ + +#if HAVE_CONFIG_H +# include "config.h" +#endif + +#include <assert.h> + +#include "aes-internal.h" + +void +aes192_encrypt(const struct aes192_ctx *ctx, + size_t length, uint8_t *dst, + const uint8_t *src) +{ + assert(!(length % AES_BLOCK_SIZE) ); + _nettle_aes_encrypt(_AES192_ROUNDS, ctx->keys, &_nettle_aes_encrypt_table, + length, dst, src); +} diff --git a/aes256-decrypt.c b/aes256-decrypt.c new file mode 100644 index 00000000..42042cf6 --- /dev/null +++ b/aes256-decrypt.c @@ -0,0 +1,50 @@ +/* aes256-decrypt.c + + Decryption function for aes/rijndael block cipher. + + Copyright (C) 2002, 2013 Niels Möller + + This file is part of GNU Nettle. + + GNU Nettle is free software: you can redistribute it and/or + modify it under the terms of either: + + * the GNU Lesser General Public License as published by the Free + Software Foundation; either version 3 of the License, or (at your + option) any later version. + + or + + * the GNU General Public License as published by the Free + Software Foundation; either version 2 of the License, or (at your + option) any later version. + + or both in parallel, as here. + + GNU Nettle is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received copies of the GNU General Public License and + the GNU Lesser General Public License along with this program. If + not, see http://www.gnu.org/licenses/. +*/ + +#if HAVE_CONFIG_H +# include "config.h" +#endif + +#include <assert.h> + +#include "aes-internal.h" + +void +aes256_decrypt(const struct aes256_ctx *ctx, + size_t length, uint8_t *dst, + const uint8_t *src) +{ + assert(!(length % AES_BLOCK_SIZE) ); + _nettle_aes_decrypt(_AES256_ROUNDS, ctx->keys, &_nettle_aes_decrypt_table, + length, dst, src); +} diff --git a/aes256-encrypt.c b/aes256-encrypt.c new file mode 100644 index 00000000..98474bb5 --- /dev/null +++ b/aes256-encrypt.c @@ -0,0 +1,50 @@ +/* aes256-encrypt.c + + Encryption function for the aes/rijndael block cipher. + + Copyright (C) 2002, 2013 Niels Möller + + This file is part of GNU Nettle. + + GNU Nettle is free software: you can redistribute it and/or + modify it under the terms of either: + + * the GNU Lesser General Public License as published by the Free + Software Foundation; either version 3 of the License, or (at your + option) any later version. + + or + + * the GNU General Public License as published by the Free + Software Foundation; either version 2 of the License, or (at your + option) any later version. + + or both in parallel, as here. + + GNU Nettle is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received copies of the GNU General Public License and + the GNU Lesser General Public License along with this program. If + not, see http://www.gnu.org/licenses/. +*/ + +#if HAVE_CONFIG_H +# include "config.h" +#endif + +#include <assert.h> + +#include "aes-internal.h" + +void +aes256_encrypt(const struct aes256_ctx *ctx, + size_t length, uint8_t *dst, + const uint8_t *src) +{ + assert(!(length % AES_BLOCK_SIZE) ); + _nettle_aes_encrypt(_AES256_ROUNDS, ctx->keys, &_nettle_aes_encrypt_table, + length, dst, src); +}
On Wed, Mar 31, 2021 at 9:18 PM Niels Möller nisse@lysator.liu.se wrote:
The reason it makes sense to me to split aes-encrypt.c, is that:
(i) It's more consistent with the other aes-related functions.
(ii) The current aes-encrypt.c contains both the encryption functions aes128_encrypt, aes192_encrypt, aes256_encrypt, which we'd want to override with assembly implementations, and the legacy wrapper function aes_encrypt, which shouldn't be overridden. So we can't use plain file-level override, but need #ifdefs too.
(iii) I've considered doing it earlier, to make it easier to implement aes without a round loop (like for all current versions of aes-encrypt-internal.*). E.g., on x86_64, for aes128 we could load all subkeys into registers and still have registers left to do two or more blocks in parallel, but then we'd need to override aes128_encrypt separately from the other aes*_encrypt.
I've tried out a split, see below patch. It's a rather large change, moving pieces to new places, but nothing difficult. I'm considering committing this to the s390x branch, what do you think?
I agree, I'll modify the patch of basic AES-128 optimized functions to be built on top of the splitted aes functions.
Regarding the large number of functions for s390x, I'm not yet convinced
we should have all of them, we'll have to consider the tradeoff between speedup and complexity case by case. In particular, cbc encrypt (but not decrypt!) is notoriously slow, since it's inherently serial. So I'm curious about potential speedup there.
Before getting too far, it may also be worthwhile to try out an assembly
memxor.
memxor performs the same in C and assembly since s390 architecture offers memory xor instruction "xc" see xor_len macro in machine.m4 of the original patch for an implementation example. However, s390x AES accelerators offer considerable speedup against C implementation with optimized internal AES. The following table demonstrates the idea more clearly:
Function S390x accelerator C implementation with optimized internal AES (Only enable aes128.asm, aes192.asm, aes256.asm) ------------------------------------------------------------------------------------------------------------------------------- CBC AES128 Encrypt 1.073569 cpb 13.674891 cpb CBC AES128 Decrypt 0.647008 cpb 3.131405 cpb CBC AES192 Encrypt 1.266316 cpb 13.183552 cpb CBC AES192 Decrypt 0.622058 cpb 3.074917 cpb CBC AES256 Encrypt 1.450422 cpb 14.380789 cpb CBC AES256 Decrypt 0.648403 cpb 3.040746 cpb CFB AES128 Encrypt 1.199716 cpb 15.116906 cpb CFB AES128 Decrypt 1.205567 cpb 3.144538 cpb CFB AES192 Encrypt 1.393276 cpb 15.340453 cpb CFB AES192 Decrypt 1.415399 cpb 3.064844 cpb CFB AES256 Encrypt 1.687762 cpb 15.876734 cpb CFB AES256 Decrypt 1.677147 cpb 3.065851 cpb CFB8 AES128 Encrypt 17.278379 cpb 178.117195 cpb CFB8 AES128 Decrypt 17.327002 cpb 183.136198 cpb CFB8 AES192 Encrypt 20.408311 cpb 184.028411 cpb CFB8 AES192 Decrypt 20.397928 cpb 187.534654 cpb CFB8 AES256 Encrypt 23.549944 cpb 184.800598 cpb CFB8 AES256 Decrypt 23.367348 cpb 190.355030 cpb CMAC AES128 Update 1.026380 cpb 12.108085 cpb CMAC AES256 Update 1.399747 cpb 11.497727 cpb CCM AES128 Encrypt 1.828593 cpb 15.332434 cpb CCM AES128 Decrypt 1.691520 cpb 14.115167 cpb CCM AES128 Update 1.027736 cpb 10.918015 cpb CCM AES192 Encrypt 1.883996 cpb 15.840703 cpb CCM AES192 Decrypt 1.950362 cpb 14.478925 cpb CCM AES192 Update 1.213858 cpb 11.239195 cpb CCM AES256 Encrypt 2.206957 cpb 15.861586 cpb CCM AES256 Decrypt 2.311447 cpb 15.051353 cpb CCM AES256 Update 1.404938 cpb 11.441472 cpb CTR AES128 Crypt 0.710237 cpb 4.767290 cpb CTR AES192 Crypt 0.635386 cpb 3.489661 cpb CTR AES256 Crypt 0.628296 cpb 3.138727 cpb XTS AES128 Encrypt 0.655454 cpb 15.757406 cpb XTS AES128 Decrypt 0.656113 cpb 15.920863 cpb XTS AES256 Encrypt 0.663048 cpb 16.689253 cpb XTS AES256 Decrypt 0.676298 cpb 16.670889 cpb GCM AES128 Encrypt 0.630504 cpb 15.473187 cpb GCM AES128 Decrypt 0.627714 cpb 15.529209 cpb GCM AES128 Update 0.514662 cpb 11.608726 cpb GCM AES192 Encrypt 0.642785 cpb 15.245804 cpb GCM AES192 Decrypt 0.631627 cpb 15.511039 cpb GCM AES192 Update 0.499630 cpb 11.745876 cpb GCM AES256 Encrypt 0.631046 cpb 15.400776 cpb GCM AES256 Decrypt 0.622329 cpb 15.419954 cpb GCM AES256 Update 0.499630 cpb 11.569789 cpb
Also, the optimized AES cores for s390x could serve as a good reference for other crypto libraries since they have clean and well-documented assembly implementation. The only drawback I can see is spamming preprocessor conditions in C files of AES modes to support fat build for those accelerators which is worth it IMO considering the speed gain we get.
regards, Mamone
Maamoun TK maamoun.tk@googlemail.com writes:
I've tried out a split, see below patch. It's a rather large change, moving pieces to new places, but nothing difficult. I'm considering committing this to the s390x branch, what do you think?
I agree, I'll modify the patch of basic AES-128 optimized functions to be built on top of the splitted aes functions.
Ok, pushed to the s390x branch now.
memxor performs the same in C and assembly since s390 architecture offers memory xor instruction "xc" see xor_len macro in machine.m4 of the original patch for an implementation example.
But the C implmementation is somewhat complicated, splitting into several cases depending on alignment, and shifting data around to be able to do word operations. If it can be done simpler with the nc instruction, that would at least cut some overhead. (Note that memxor3 must support the overlap case needed by cbc decrypt).
However, s390x AES accelerators offer considerable speedup against C implementation with optimized internal AES. The following table demonstrates the idea more clearly:
Function S390x accelerator C implementation with optimized internal AES (Only enable aes128.asm, aes192.asm, aes256.asm)
[...]
CBC AES128 Decrypt 0.647008 cpb 3.131405 cpb
[...]
CTR AES128 Crypt 0.710237 cpb 4.767290 cpb
For these two, the speed difference should essentially be the time for the C implementation of memxor. "cpb" mean cycles per byte, right? 2-4 cycles per byte for memxor is quite slow. On my x86_64 laptop (ok, comparing apples to oranges), memxor, for the aligned case, is 0.08 cpb, and memxor twice as much. And even the C implementation is not that much slower.
GCM AES128 Encrypt 0.630504 cpb 15.473187 cpb
For GCM, are there instructions that combine AES-CTR and GCM HASH? Or are those done separately? It would be nice to have GCM HASH being fast by itself, for performance with other ciphers than aes.
Regards, /Niels
On Thu, Apr 1, 2021 at 7:57 AM Niels Möller nisse@lysator.liu.se wrote:
For GCM, are there instructions that combine AES-CTR and GCM HASH? Or are those done separately? It would be nice to have GCM HASH being fast by itself, for performance with other ciphers than aes.
MSA_X4 has a GHASH implementation using KIMD-GHASH built-in function which optimizes the performance of GHASH authentication for aes and non-aes ciphers. MSA_X6 implements KMA-GCM-AES-128, KMA-GCM-AES-192, and KMA-GCM-AES-256 functions that maximize the performance of AES-GCM.
On Thu, Apr 1, 2021 at 12:01 AM Maamoun TK maamoun.tk@googlemail.com wrote:
I'll modify the patch of basic AES-128 optimized functions to be built on top of the splitted aes functions.
Done! It works as a file-override basis. The patch also passes the testsuite and yields expected benchmark numbers.
regards, Mamone
Hi Niels, hope you are doing well now Any update on this patch?
regards, Mamone
On Mon, Apr 5, 2021 at 11:49 PM Maamoun TK maamoun.tk@googlemail.com wrote:
On Thu, Apr 1, 2021 at 12:01 AM Maamoun TK maamoun.tk@googlemail.com wrote:
I'll modify the patch of basic AES-128 optimized functions to be built on top of the splitted aes functions.
Done! It works as a file-override basis. The patch also passes the testsuite and yields expected benchmark numbers.
regards, Mamone
Maamoun TK maamoun.tk@googlemail.com writes:
Hi Niels, hope you are doing well now Any update on this patch?
Thanks, I'm feeling a lot better, although still a bit tired.
Is https://git.lysator.liu.se/nettle/nettle/-/merge_requests/23 still the current code?
I hope to be back to reviewing pending patches soon, but I also got a fairly serious bug report a few days ago that I need to attend to first.
Regards, /Niels
On Sat, May 1, 2021 at 6:11 PM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
Hi Niels, hope you are doing well now Any update on this patch?
Thanks, I'm feeling a lot better, although still a bit tired.
Good, I hope your recovery is going well.
Is https://git.lysator.liu.se/nettle/nettle/-/merge_requests/23 still the current code?
Yes, it's still up to date.
I hope to be back to reviewing pending patches soon, but I also got a fairly serious bug report a few days ago that I need to attend to first.
No problem, take your time and take care. I also want to say that the white paper "Optimize AES-GCM for PowerPC architecture processor" is almost done and it looks great, I'm just doing some polishing now. Hope we can get the pending patches done soon so we can have time to review the research and the upcoming patches.
regards, Mamone
On Sat, May 1, 2021 at 6:11 PM Niels Möller nisse@lysator.liu.se wrote:
Is https://git.lysator.liu.se/nettle/nettle/-/merge_requests/23 still the current code?
I've added the basic AES-192 and AES-256 too since there is no problem to test them all together.
For the other the modes, I don't think we can continue with the file-override basis to handle the optimized cores, if we take ccm-aes128.c as example you will see it has 8 functions, separating them in one file each is overkill so my approach to handle such case is to redefine the function name of C file to not conflict with the name of optimized core in case the optimized core is exit, this tricks is also used to support fat build for functions because we need to keep the two functions (C and optimized core) around to be picked at run-time. So redefining the function name will avoid the conflict with the optimized core and it's the first step to support fat build for that function, take a look how I implemented this approach in ccm-aes128.c file https://git.lysator.liu.se/mamonet/nettle/-/blob/s390x-aes/ccm-aes128.c#L52ccm-aes128.c · s390x-aes · Maamoun TK / nettle · GitLab (liu.se) https://git.lysator.liu.se/mamonet/nettle/-/blob/s390x-aes/ccm-aes128.c#L48-53
https://git.lysator.liu.se/mamonet/nettle/-/blob/s390x-aes/ccm-aes128.c#L52 regards, Mamone https://git.lysator.liu.se/mamonet/nettle/-/blob/s390x-aes/ccm-aes128.c#L52
Maamoun TK maamoun.tk@googlemail.com writes:
On Sat, May 1, 2021 at 6:11 PM Niels Möller nisse@lysator.liu.se wrote:
Is https://git.lysator.liu.se/nettle/nettle/-/merge_requests/23 still the current code?
I've added the basic AES-192 and AES-256 too since there is no problem to test them all together.
Merged to the s390x branch now. Thanks for your patience.
For further improvement, it would be nice to have aesN_set_encrypt_key and aesN_set_decrypt_key be two entrypoints to the same function. But will make the file replacement logic a bit more complex.
And maybe the public aes*_invert_key functions should be marked as deprecated (and deleted, next time we have an abi break)? No other ciphers in Nettle have this feature, and it's not that useful for applications. From codesearch.debian.net, it looks like they are exposed by the haskell and rust bindings, though.
For the other the modes,
Before doing the other modes, do you think you could investigate if memxor and memxor3 can be sped up? That should benefit many ciphers and modes, and give more relevant speedup numbers for specialized functions like aes cbc and aes ctr.
The best strategy depends on whether or not unaligned memory access is possible and efficient. All current implementations do aligned writes to the destination area (and smaller writes if needed at the edges). For the C implementation and several of the asm implementations, they also do aligned reads, and use shifting to get inputs xored together at the right places.
While the x86_64 implementation uses unaligned reads, since that seems as efficient, and reduces complexity quite a lot.
On all platforms I'm familiar with, assembly implementations can assume that it is safe to read a few bytes outside the edge of the input buffer, as long as those reads don't cross a word boundary (corresponding to valgrind option --partial-loads-ok=yes).
Ideally, memxor performance should be limited by memory/cache bandwidth (with data in L1 cache probably being the most important case. It looks like nettle-benchmark calls it with a size of 10 KB).
Note that memxor3 must process data in descending address order, to support the call from cbc_decrypt, with overlapping operands.
Regards, /Niels
On Sun, May 9, 2021 at 11:19 AM Niels Möller nisse@lysator.liu.se wrote:
Before doing the other modes, do you think you could investigate if memxor and memxor3 can be sped up? That should benefit many ciphers and modes, and give more relevant speedup numbers for specialized functions like aes cbc and aes ctr.
The best strategy depends on whether or not unaligned memory access is possible and efficient. All current implementations do aligned writes to the destination area (and smaller writes if needed at the edges). For the C implementation and several of the asm implementations, they also do aligned reads, and use shifting to get inputs xored together at the right places.
While the x86_64 implementation uses unaligned reads, since that seems as efficient, and reduces complexity quite a lot.
On all platforms I'm familiar with, assembly implementations can assume that it is safe to read a few bytes outside the edge of the input buffer, as long as those reads don't cross a word boundary (corresponding to valgrind option --partial-loads-ok=yes).
Ideally, memxor performance should be limited by memory/cache bandwidth (with data in L1 cache probably being the most important case. It looks like nettle-benchmark calls it with a size of 10 KB).
Note that memxor3 must process data in descending address order, to support the call from cbc_decrypt, with overlapping operands.
This is great information that I can keep in my memory for next implementations. s390x arch offers 'xc' instruction "Storage-to-storage XOR" at maximum length of 256 bytes but we can do as many iterations as we need. I optimized memxor using that instruction as it achieves the optimal performance for such case, I'll attach the patch at the end of message. Unfortunately, I couldn't manage to optimize memxor3 using 'xc' instruction because while it supports the overlapped operands it processes them from left to right, one byte at a time. However, I think optimizing just memxor could make a good sense of how much it would increase the performance of AES modes. CBC mode could come in handy here since it uses memxor in encrypt and decrypt operations in case the operands of decrypt operation don't overlap. Here is the benchmark result of CBC mode:
*---------------------------------------------------------------------------------------------------* | AES-128 Encrypt | AES-128 Decrypt | |------------------------------------------------------------------------|----------------------------| | CBC-Accelerator 1.18 cbp | 0.75 cbp | | Basic AES-Accelerator 13.50 cbp | 3.34 cbp | | Basic AES-Accelerator with memxor 15.50 | 1.57 | *-----------------------------------------------------------------------------------------------------*
I can interpret the decrease in performance using optimized memxor by the overhead caused by 'ex' instruction since "xor_len" macro patches the length of 'xc' instruction then it fetches that instruction in memory in order to execute it, that happens for every single block so it makes sense to get more cycles per byte. The decrypt operation is improved using optimized memxor but still with CBC-accelerator it's almost twice the speed. The speed of encrypt operation doesn't improve and TBH 15 cycles per byte are a lot of cycles for CBC mode so we really need to consider the accelerators since it offers an optimal performance for the architecture.
regards, Mamone
--- s390x/machine.m4 | 13 +++++++++++++ s390x/memxor.asm | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 67 insertions(+) create mode 100644 s390x/memxor.asm
diff --git a/s390x/machine.m4 b/s390x/machine.m4 index acd5e26c..b94c408a 100644 --- a/s390x/machine.m4 +++ b/s390x/machine.m4 @@ -1,2 +1,15 @@ C Register usage: define(`RA', `%r14') + +C XOR contents of two areas in storage with specific length +C len cannot be assigned to general register 0 +C len <= 256 +C xor_len(dst, src, len, tmp_addr) +define(`xor_len', +`larl $4,18f + aghi $3,-1 + jm 19f + ex $3,0($4) + j 19f +18: xc 0(1,$1),0($2) +19:') diff --git a/s390x/memxor.asm b/s390x/memxor.asm new file mode 100644 index 00000000..178e68e9 --- /dev/null +++ b/s390x/memxor.asm @@ -0,0 +1,54 @@ +C s390/memxor.asm + +ifelse(` + Copyright (C) 2021 Mamone Tarsha + This file is part of GNU Nettle. + + GNU Nettle is free software: you can redistribute it and/or + modify it under the terms of either: + + * the GNU Lesser General Public License as published by the Free + Software Foundation; either version 3 of the License, or (at your + option) any later version. + + or + + * the GNU General Public License as published by the Free + Software Foundation; either version 2 of the License, or (at your + option) any later version. + + or both in parallel, as here. + + GNU Nettle is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received copies of the GNU General Public License and + the GNU Lesser General Public License along with this program. If + not, see http://www.gnu.org/licenses/. +') + +.file "memxor.asm" + +.text + +C void * memxor(void *dst, const void *src, size_t n) + +PROLOGUE(nettle_memxor) + lgr %r0,%r2 + srlg %r5,%r4,8 + clgije %r5,0,Llen +L256_loop: + xc 0(256,%r2),0(%r3) + aghi %r2,256 + aghi %r3,256 + brctg %r5,L256_loop +Llen: + risbg %r5,%r4,56,191,0 + jz Ldone + xor_len(%r2,%r3,%r5,%r1) +Ldone: + lgr %r2,%r0 + br RA +EPILOGUE(nettle_memxor)
Maamoun TK maamoun.tk@googlemail.com writes:
This is great information that I can keep in my memory for next implementations. s390x arch offers 'xc' instruction "Storage-to-storage XOR" at maximum length of 256 bytes but we can do as many iterations as we need. I optimized memxor using that instruction as it achieves the optimal performance for such case, I'll attach the patch at the end of message.
Nice! I'd like to merge this as soon as the s390x ci is up and running again.
Unfortunately, I couldn't manage to optimize memxor3 using 'xc' instruction because while it supports the overlapped operands it processes them from left to right, one byte at a time.
Hmm, I wonder if there's some way to work around that.
However, I think optimizing just memxor could make a good sense of how much it would increase the performance of AES modes. CBC mode could come in handy here since it uses memxor in encrypt and decrypt operations in case the operands of decrypt operation don't overlap. Here is the benchmark result of CBC mode:
*---------------------------------------------------------------------------------------------------* | AES-128 Encrypt | AES-128 Decrypt | |------------------------------------------------------------------------|----------------------------| | CBC-Accelerator 1.18 cbp | 0.75 cbp | | Basic AES-Accelerator 13.50 cbp | 3.34 cbp | | Basic AES-Accelerator with memxor 15.50 | 1.57 | *-----------------------------------------------------------------------------------------------------*
This seems to confirm that cbc encrypt is the operation that gains the most from assembly for the combined operation. That aes decrypt can also gain a factor two in performance, does that mean that both aes-cbc and memxor run at speed limited by memory bandwidth? And then the gain is from one less pass loading and storing data from memory?
What unit is "cbp"? If it's cycles per byte, 0.77 cycles/byte for memxor (the cost of "Basic AES-Accelerator with memxor" minus cost of CBC-Accellerator) sounds unexpectedly slow, compared to, e.g, x86_64, where I get 0.08 cycles per byte (regardless of alignment), or 0.64 cycles per 64-bit word.
Regards, /Niels
On Sun, May 9, 2021 at 9:49 PM Niels Möller nisse@lysator.liu.se wrote:
This seems to confirm that cbc encrypt is the operation that gains the most from assembly for the combined operation. That aes decrypt can also gain a factor two in performance, does that mean that both aes-cbc and memxor run at speed limited by memory bandwidth? And then the gain is from one less pass loading and storing data from memory?
I can't think of another reason.
What unit is "cbp"?
Yes, Cycles per byte. I spelled it wrong in the last message.
If it's cycles per byte, 0.77 cycles/byte for memxor
(the cost of "Basic AES-Accelerator with memxor" minus cost of CBC-Accellerator) sounds unexpectedly slow, compared to, e.g, x86_64, where I get 0.08 cycles per byte (regardless of alignment), or 0.64 cycles per 64-bit word.
I'm calculating cycles per byte as follows: Frequency/(Buf_size/Elapsed_time); Units are Hz, Byte, Second respectively. I measured the cycles per byte for memxor on z15 I got: 2.8 cpb for C implementation 0.9 cpb for optimized memxor If my calculation is correct, then accessing memory in z/architecture processors in a quit expensive comparing to other architectures.
regards, Mamone
nisse@lysator.liu.se (Niels Möller) writes:
(iii) I've considered doing it earlier, to make it easier to implement aes without a round loop (like for all current versions of aes-encrypt-internal.*). E.g., on x86_64, for aes128 we could load all subkeys into registers and still have registers left to do two or more blocks in parallel, but then we'd need to override aes128_encrypt separately from the other aes*_encrypt.
I've given this a try, see experimental patch below. It adds a x86_64/aesni/aes128-encrypt.asm, with a 2-way loop. It gives a very modest speedup, 5%, when I benchmark on my laptop (which is now a pretty fast machine, AMD Ryzen 5). I've also added a cbc-aes128-encrypt.asm. That gives more significant speedup, almost 60%. I think main reason for the speedup is that we avoid reloading subkeys between blocks.
If we want to go this way, I wonder how to do it without an explosion of files and functions. For s390x, it seems each function will be very small, but not so for most other archs. There are at least three modes that are similar to cbc encrypt in that they have to process blocks sequentially, with no parallelism: CBC encrypt, CMAC, and XTS (there may be more). It's not so nice if we need (modes × ciphers) number of assembly files, with lots of duplication.
Regards, /Niels
diff --git a/ChangeLog b/ChangeLog index 3d19b1dd..68b8f632 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,13 @@ 2021-04-01 Niels Möller nisse@lysator.liu.se
+ * cbc-aes128-encrypt.c (nettle_cbc_aes128_encrypt): New file and function. + * x86_64/aesni/cbc-aes128-encrypt.asm: New file. + + * configure.ac (asm_replace_list): Add aes128-encrypt.asm + aes128-decrypt.asm. + * x86_64/aesni/aes128-encrypt.asm: New file, with 2-way loop. + * x86_64/aesni/aes128-decrypt.asm: Likewise. + Move aes128_encrypt and similar functions to their own files. To make it easier for assembly implementations to override specific AES variants. diff --git a/Makefile.in b/Makefile.in index 8d474d1e..b6b983fd 100644 --- a/Makefile.in +++ b/Makefile.in @@ -101,7 +101,8 @@ nettle_SOURCES = aes-decrypt-internal.c aes-decrypt.c aes-decrypt-table.c \ camellia256-set-encrypt-key.c camellia256-crypt.c \ camellia256-set-decrypt-key.c \ camellia256-meta.c \ - cast128.c cast128-meta.c cbc.c \ + cast128.c cast128-meta.c \ + cbc.c cbc-aes128-encrypt.c \ ccm.c ccm-aes128.c ccm-aes192.c ccm-aes256.c cfb.c \ siv-cmac.c siv-cmac-aes128.c siv-cmac-aes256.c \ cnd-memcpy.c \ diff --git a/cbc-aes128-encrypt.c b/cbc-aes128-encrypt.c new file mode 100644 index 00000000..5f7d1c8c --- /dev/null +++ b/cbc-aes128-encrypt.c @@ -0,0 +1,42 @@ +/* cbc-aes128-encrypt.c + + Copyright (C) 2013, 2014 Niels Möller + + This file is part of GNU Nettle. + + GNU Nettle is free software: you can redistribute it and/or + modify it under the terms of either: + + * the GNU Lesser General Public License as published by the Free + Software Foundation; either version 3 of the License, or (at your + option) any later version. + + or + + * the GNU General Public License as published by the Free + Software Foundation; either version 2 of the License, or (at your + option) any later version. + + or both in parallel, as here. + + GNU Nettle is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received copies of the GNU General Public License and + the GNU Lesser General Public License along with this program. If + not, see http://www.gnu.org/licenses/. +*/ + +#if HAVE_CONFIG_H +# include "config.h" +#endif + +#include "cbc.h" + +void +nettle_cbc_aes128_encrypt(struct cbc_aes128_ctx *ctx, size_t length, uint8_t *dst, const uint8_t *src) +{ + CBC_ENCRYPT(ctx, aes128_encrypt, length, dst, src); +} diff --git a/cbc.h b/cbc.h index 93b2e739..beece610 100644 --- a/cbc.h +++ b/cbc.h @@ -35,6 +35,7 @@ #define NETTLE_CBC_H_INCLUDED
#include "nettle-types.h" +#include "aes.h"
#ifdef __cplusplus extern "C" { @@ -79,6 +80,10 @@ memcpy((ctx)->iv, (data), sizeof((ctx)->iv)) sizeof((self)->iv), (self)->iv, \ (length), (dst), (src)))
+struct cbc_aes128_ctx CBC_CTX(struct aes128_ctx, AES_BLOCK_SIZE); +void +nettle_cbc_aes128_encrypt(struct cbc_aes128_ctx *ctx, size_t length, uint8_t *dst, const uint8_t *src); + #ifdef __cplusplus } #endif diff --git a/configure.ac b/configure.ac index be2916c1..26e41d89 100644 --- a/configure.ac +++ b/configure.ac @@ -544,6 +544,7 @@ fi # Files which replace a C source file (or otherwise don't correspond # to a new object file). asm_replace_list="aes-encrypt-internal.asm aes-decrypt-internal.asm \ + aes128-encrypt.asm aes128-decrypt.asm cbc-aes128-encrypt.asm \ arcfour-crypt.asm camellia-crypt-internal.asm \ md5-compress.asm memxor.asm memxor3.asm \ poly1305-internal.asm \ diff --git a/examples/nettle-benchmark.c b/examples/nettle-benchmark.c index 9ce3a733..686cf3b9 100644 --- a/examples/nettle-benchmark.c +++ b/examples/nettle-benchmark.c @@ -240,6 +240,21 @@ bench_ctr(void *arg) BENCH_BLOCK, info->dst, info->src); }
+struct bench_cbc_aes128_info +{ + struct cbc_aes128_ctx ctx; + + const uint8_t *src; + uint8_t *dst; +}; + +static void +bench_cbc_aes128(void *arg) +{ + struct bench_cbc_aes128_info *info = arg; + nettle_cbc_aes128_encrypt(&info->ctx, BENCH_BLOCK, info->dst, info->src); +} + struct bench_aead_info { void *ctx; @@ -740,6 +755,29 @@ time_cipher(const struct nettle_cipher *cipher) free(key); }
+static void +time_cbc_aes128(void) +{ + struct bench_cbc_aes128_info info; + uint8_t key[AES128_KEY_SIZE]; + uint8_t iv[AES_BLOCK_SIZE]; + + static uint8_t src_data[BENCH_BLOCK]; + static uint8_t data[BENCH_BLOCK]; + + init_key(sizeof(key), key); + init_key(sizeof(iv), iv); + init_data(data); + init_data(src_data); + + aes128_set_encrypt_key(&info.ctx.ctx, key); + CBC_SET_IV(&info.ctx, iv); + info.src = src_data; + info.dst = data; + display("aes128", "new cbc", AES_BLOCK_SIZE, + time_function(bench_cbc_aes128, &info)); +} + static void time_aead(const struct nettle_aead *aead) { @@ -1027,6 +1065,9 @@ main(int argc, char **argv) if (!alg || strstr ("hmac-sha512", alg)) time_hmac_sha512();
+ if (!alg || strstr ("cbc-aes128", alg)) + time_cbc_aes128(); + optind++; } while (alg && argv[optind]);
diff --git a/testsuite/cbc-test.c b/testsuite/cbc-test.c index 9394f1cb..ff0c4cbe 100644 --- a/testsuite/cbc-test.c +++ b/testsuite/cbc-test.c @@ -3,6 +3,43 @@ #include "cbc.h" #include "knuth-lfib.h"
+static void +test_cbc_aes128(const struct tstring *key, + const struct tstring *cleartext, + const struct tstring *ciphertext, + const struct tstring *iiv) +{ + struct cbc_aes128_ctx ctx; + uint8_t *data; + size_t length; + + ASSERT (cleartext->length == ciphertext->length); + length = cleartext->length; + + ASSERT (key->length == AES128_KEY_SIZE); + ASSERT (iiv->length == AES_BLOCK_SIZE); + + data = xalloc(length); + aes128_set_encrypt_key(&ctx.ctx, key->data); + CBC_SET_IV(&ctx, iiv->data); + + nettle_cbc_aes128_encrypt(&ctx, + length, data, cleartext->data); + + if (!MEMEQ(length, data, ciphertext->data)) + { + fprintf(stderr, "CBC encrypt failed:\nInput:"); + tstring_print_hex(cleartext); + fprintf(stderr, "\nOutput: "); + print_hex(length, data); + fprintf(stderr, "\nExpected:"); + tstring_print_hex(ciphertext); + fprintf(stderr, "\n"); + FAIL(); + } + free(data); +} + /* Test with more data and inplace decryption, to check that the * cbc_decrypt buffering works. */ #define CBC_BULK_DATA 0x2710 /* 10000 */ @@ -161,6 +198,17 @@ test_main(void) "b2eb05e2c39be9fcda6c19078c6a9d1b"), SHEX("000102030405060708090a0b0c0d0e0f"));
+ test_cbc_aes128(SHEX("2b7e151628aed2a6abf7158809cf4f3c"), + SHEX("6bc1bee22e409f96e93d7e117393172a" + "ae2d8a571e03ac9c9eb76fac45af8e51" + "30c81c46a35ce411e5fbc1191a0a52ef" + "f69f2445df4f9b17ad2b417be66c3710"), + SHEX("7649abac8119b246cee98e9b12e9197d" + "5086cb9b507219ee95db113a917678b2" + "73bed6b8e3c1743b7116e69e22229516" + "3ff1caa1681fac09120eca307586e1a7"), + SHEX("000102030405060708090a0b0c0d0e0f")); + test_cbc_bulk(); }
diff --git a/x86_64/aesni/aes128-decrypt.asm b/x86_64/aesni/aes128-decrypt.asm new file mode 100644 index 00000000..79111e47 --- /dev/null +++ b/x86_64/aesni/aes128-decrypt.asm @@ -0,0 +1,136 @@ +C x86_64/aesni/aes128-decrypt.asm + +ifelse(` + Copyright (C) 2015, 2018, 2021 Niels Möller + + This file is part of GNU Nettle. + + GNU Nettle is free software: you can redistribute it and/or + modify it under the terms of either: + + * the GNU Lesser General Public License as published by the Free + Software Foundation; either version 3 of the License, or (at your + option) any later version. + + or + + * the GNU General Public License as published by the Free + Software Foundation; either version 2 of the License, or (at your + option) any later version. + + or both in parallel, as here. + + GNU Nettle is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received copies of the GNU General Public License and + the GNU Lesser General Public License along with this program. If + not, see http://www.gnu.org/licenses/. +') + +C Input argument +define(`CTX', `%rdi') +define(`LENGTH',`%rsi') +define(`DST', `%rdx') +define(`SRC', `%rcx') + +define(`KEY0', `%xmm0') +define(`KEY1', `%xmm1') +define(`KEY2', `%xmm2') +define(`KEY3', `%xmm3') +define(`KEY4', `%xmm4') +define(`KEY5', `%xmm5') +define(`KEY6', `%xmm6') +define(`KEY7', `%xmm7') +define(`KEY8', `%xmm8') +define(`KEY9', `%xmm9') +define(`KEY10', `%xmm10') +define(`X', `%xmm11') +define(`Y', `%xmm12') + + .file "aes128-decrypt.asm" + + C nettle_aes128_decrypt(const struct aes128_ctx *ctx, + C size_t length, uint8_t *dst, + C const uint8_t *src); + + .text + ALIGN(16) +PROLOGUE(nettle_aes128_decrypt) + W64_ENTRY(4, 13) + shr $4, LENGTH + test LENGTH, LENGTH + jz .Lend + + movups (CTX), KEY0 + movups 16(CTX), KEY1 + movups 32(CTX), KEY2 + movups 48(CTX), KEY3 + movups 64(CTX), KEY4 + movups 80(CTX), KEY5 + movups 96(CTX), KEY6 + movups 112(CTX), KEY7 + movups 128(CTX), KEY8 + movups 144(CTX), KEY9 + movups 160(CTX), KEY10 + shr LENGTH + jnc .Lblock_loop + + movups (SRC), X + pxor KEY0, X + aesdec KEY1, X + aesdec KEY2, X + aesdec KEY3, X + aesdec KEY4, X + aesdec KEY5, X + aesdec KEY6, X + aesdec KEY7, X + aesdec KEY8, X + aesdec KEY9, X + aesdeclast KEY10, X + + movups X, (DST) + add $16, SRC + add $16, DST + test LENGTH, LENGTH + jz .Lend + +.Lblock_loop: + movups (SRC), X + movups 16(SRC), Y + pxor KEY0, X + pxor KEY0, Y + aesdec KEY1, X + aesdec KEY1, Y + aesdec KEY2, X + aesdec KEY2, Y + aesdec KEY3, X + aesdec KEY3, Y + aesdec KEY4, X + aesdec KEY4, Y + aesdec KEY5, X + aesdec KEY5, Y + aesdec KEY6, X + aesdec KEY6, Y + aesdec KEY7, X + aesdec KEY7, Y + aesdec KEY8, X + aesdec KEY8, Y + aesdec KEY9, X + aesdec KEY9, Y + aesdeclast KEY10, X + aesdeclast KEY10, Y + + movups X, (DST) + movups Y, 16(DST) + add $32, SRC + add $32, DST + dec LENGTH + jnz .Lblock_loop + +.Lend: + W64_EXIT(4, 13) + ret +EPILOGUE(nettle_aes128_decrypt) diff --git a/x86_64/aesni/aes128-encrypt.asm b/x86_64/aesni/aes128-encrypt.asm new file mode 100644 index 00000000..8e7ebe78 --- /dev/null +++ b/x86_64/aesni/aes128-encrypt.asm @@ -0,0 +1,136 @@ +C x86_64/aesni/aes128-encrypt.asm + +ifelse(` + Copyright (C) 2015, 2018, 2021 Niels Möller + + This file is part of GNU Nettle. + + GNU Nettle is free software: you can redistribute it and/or + modify it under the terms of either: + + * the GNU Lesser General Public License as published by the Free + Software Foundation; either version 3 of the License, or (at your + option) any later version. + + or + + * the GNU General Public License as published by the Free + Software Foundation; either version 2 of the License, or (at your + option) any later version. + + or both in parallel, as here. + + GNU Nettle is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received copies of the GNU General Public License and + the GNU Lesser General Public License along with this program. If + not, see http://www.gnu.org/licenses/. +') + +C Input argument +define(`CTX', `%rdi') +define(`LENGTH',`%rsi') +define(`DST', `%rdx') +define(`SRC', `%rcx') + +define(`KEY0', `%xmm0') +define(`KEY1', `%xmm1') +define(`KEY2', `%xmm2') +define(`KEY3', `%xmm3') +define(`KEY4', `%xmm4') +define(`KEY5', `%xmm5') +define(`KEY6', `%xmm6') +define(`KEY7', `%xmm7') +define(`KEY8', `%xmm8') +define(`KEY9', `%xmm9') +define(`KEY10', `%xmm10') +define(`X', `%xmm11') +define(`Y', `%xmm12') + + .file "aes128-encrypt.asm" + + C nettle_aes128_encrypt(const struct aes128_ctx *ctx, + C size_t length, uint8_t *dst, + C const uint8_t *src); + + .text + ALIGN(16) +PROLOGUE(nettle_aes128_encrypt) + W64_ENTRY(4, 13) + shr $4, LENGTH + test LENGTH, LENGTH + jz .Lend + + movups (CTX), KEY0 + movups 16(CTX), KEY1 + movups 32(CTX), KEY2 + movups 48(CTX), KEY3 + movups 64(CTX), KEY4 + movups 80(CTX), KEY5 + movups 96(CTX), KEY6 + movups 112(CTX), KEY7 + movups 128(CTX), KEY8 + movups 144(CTX), KEY9 + movups 160(CTX), KEY10 + shr LENGTH + jnc .Lblock_loop + + movups (SRC), X + pxor KEY0, X + aesenc KEY1, X + aesenc KEY2, X + aesenc KEY3, X + aesenc KEY4, X + aesenc KEY5, X + aesenc KEY6, X + aesenc KEY7, X + aesenc KEY8, X + aesenc KEY9, X + aesenclast KEY10, X + + movups X, (DST) + add $16, SRC + add $16, DST + test LENGTH, LENGTH + jz .Lend + +.Lblock_loop: + movups (SRC), X + movups 16(SRC), Y + pxor KEY0, X + pxor KEY0, Y + aesenc KEY1, X + aesenc KEY1, Y + aesenc KEY2, X + aesenc KEY2, Y + aesenc KEY3, X + aesenc KEY3, Y + aesenc KEY4, X + aesenc KEY4, Y + aesenc KEY5, X + aesenc KEY5, Y + aesenc KEY6, X + aesenc KEY6, Y + aesenc KEY7, X + aesenc KEY7, Y + aesenc KEY8, X + aesenc KEY8, Y + aesenc KEY9, X + aesenc KEY9, Y + aesenclast KEY10, X + aesenclast KEY10, Y + + movups X, (DST) + movups Y, 16(DST) + add $32, SRC + add $32, DST + dec LENGTH + jnz .Lblock_loop + +.Lend: + W64_EXIT(4, 13) + ret +EPILOGUE(nettle_aes128_encrypt) diff --git a/x86_64/aesni/cbc-aes128-encrypt.asm b/x86_64/aesni/cbc-aes128-encrypt.asm new file mode 100644 index 00000000..04c6c6b0 --- /dev/null +++ b/x86_64/aesni/cbc-aes128-encrypt.asm @@ -0,0 +1,108 @@ +C x86_64/aesni/cbc-aes128-encrypt.asm + +ifelse(` + Copyright (C) 2015, 2018, 2021 Niels Möller + + This file is part of GNU Nettle. + + GNU Nettle is free software: you can redistribute it and/or + modify it under the terms of either: + + * the GNU Lesser General Public License as published by the Free + Software Foundation; either version 3 of the License, or (at your + option) any later version. + + or + + * the GNU General Public License as published by the Free + Software Foundation; either version 2 of the License, or (at your + option) any later version. + + or both in parallel, as here. + + GNU Nettle is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received copies of the GNU General Public License and + the GNU Lesser General Public License along with this program. If + not, see http://www.gnu.org/licenses/. +') + +C Input argument +define(`CTX', `%rdi') +define(`LENGTH',`%rsi') +define(`DST', `%rdx') +define(`SRC', `%rcx') + +define(`KEY0', `%xmm0') +define(`KEY1', `%xmm1') +define(`KEY2', `%xmm2') +define(`KEY3', `%xmm3') +define(`KEY4', `%xmm4') +define(`KEY5', `%xmm5') +define(`KEY6', `%xmm6') +define(`KEY7', `%xmm7') +define(`KEY8', `%xmm8') +define(`KEY9', `%xmm9') +define(`KEY10', `%xmm10') +define(`X', `%xmm11') +define(`BLOCK', `%xmm12') + + .file "cbc-aes128-encrypt.asm" + + C nettle_cbc_aes128_encrypt(struct cbc_aes128_ctx *ctx, + C size_t length, uint8_t *dst, + C const uint8_t *src); + + .text + ALIGN(16) +PROLOGUE(nettle_cbc_aes128_encrypt) + W64_ENTRY(4, 13) + shr $4, LENGTH + test LENGTH, LENGTH + jz .Lend + + movups (CTX), KEY0 + movups 16(CTX), KEY1 + movups 32(CTX), KEY2 + movups 48(CTX), KEY3 + movups 64(CTX), KEY4 + movups 80(CTX), KEY5 + movups 96(CTX), KEY6 + movups 112(CTX), KEY7 + movups 128(CTX), KEY8 + movups 144(CTX), KEY9 + movups 160(CTX), KEY10 + movups 176(CTX), X C Load IV + +.Lblock_loop: + movups (SRC), BLOCK C Cleartext block + pxor BLOCK, X + pxor KEY0, X + aesenc KEY1, X + aesenc KEY2, X + aesenc KEY3, X + aesenc KEY4, X + aesenc KEY5, X + aesenc KEY6, X + aesenc KEY7, X + aesenc KEY8, X + aesenc KEY9, X + aesenclast KEY10, X + + movups X, (DST) + add $16, SRC + add $16, DST + + dec LENGTH + jnz .Lblock_loop + + C Save IV + movups X, 176(CTX) + +.Lend: + W64_EXIT(4, 13) + ret +EPILOGUE(nettle_cbc_aes128_encrypt)
On Thu, Apr 1, 2021 at 5:21 PM Niels Möller nisse@lysator.liu.se wrote:
nisse@lysator.liu.se (Niels Möller) writes:
(iii) I've considered doing it earlier, to make it easier to implement aes without a round loop (like for all current versions of aes-encrypt-internal.*). E.g., on x86_64, for aes128 we could load all subkeys into registers and still have registers left to do two or more blocks in parallel, but then we'd need to override aes128_encrypt separately from the other aes*_encrypt.
I've given this a try, see experimental patch below. It adds a x86_64/aesni/aes128-encrypt.asm, with a 2-way loop. It gives a very modest speedup, 5%, when I benchmark on my laptop (which is now a pretty fast machine, AMD Ryzen 5). I've also added a cbc-aes128-encrypt.asm. That gives more significant speedup, almost 60%. I think main reason for the speedup is that we avoid reloading subkeys between blocks.
If we want to go this way, I wonder how to do it without an explosion of files and functions. For s390x, it seems each function will be very small, but not so for most other archs. There are at least three modes that are similar to cbc encrypt in that they have to process blocks sequentially, with no parallelism: CBC encrypt, CMAC, and XTS (there may be more). It's not so nice if we need (modes × ciphers) number of assembly files, with lots of duplication.
I can think of a core function for AES-CBC mode cbc_aes_encrypt that supplies cbc_aes128_encrypt, cbc_aes192_encrypt, and cbc_aes256_encrypt function, now we can optimize cbc_aes_encrypt in assembly while taking care of rounds parameter during implementing. I still prefer duplicating files and functions for AES modes with different rounds rather than going with this approach as I can't think of any other solution.
nisse@lysator.liu.se (Niels Möller) writes:
I've also added a cbc-aes128-encrypt.asm. That gives more significant speedup, almost 60%. I think main reason for the speedup is that we avoid reloading subkeys between blocks.
I've continued this path, see branch aes-cbc. The aes128 variant is at
https://git.lysator.liu.se/nettle/nettle/-/blob/aes-cbc/x86_64/aesni/cbc-aes...
Benchmark results are positive but a bit puzzling. On my laptop (AMD Ryzen 5) I get
aes128 ECB encrypt 5450.18
This is the latest version, doing two blocks per iteration.
aes128 CBC encrypt 547.34
The general CBC mode written in C, with one call to aes128_encrypt per block. 10(!) times slower than ECB.
cbc_aes128 encrypt 865.11
The new assembly function. Almost 60% speedup over the old code, which is nice, and large enough that it seems motivated to have the new functin. But still 6 times slower than ECB. I'm not sure why. Let's look a bit closer at cycle numbers.
Not sure I get accurate cycle numbers (it's a bit tricky with variable features and turbo modes and whatnot), but it looks like ECB mode is 6 cycles per block, which would be consistent with issue of two aesenc instructions per block. While the CBC mode is 37 cycles per block, almost 4 cycles per aesenc.
This could be explained if (i) latency of aesenc is 3-4 cycles, and (ii) the processor's out-of-order machinery results in as many as 7-8 blocks processed in parallel when executing the ECB loop, i.e., instruction issue for 3-4 iterations through the loop before the results of the first iteration is ready.
The interface for the new function is
struct cbc_aes128_ctx CBC_CTX(struct aes128_ctx, AES_BLOCK_SIZE); void cbc_aes128_encrypt(struct cbc_aes128_ctx *ctx, size_t length, uint8_t *dst, const uint8_t *src);
I'm not that fond of the struct cbc_aes128_ctx though, which includes both (constant) subkeys and iv. So I'm considering changing that to
void cbc_aes128_encrypt(const struct aes128_ctx *ctx, uint8_t *iv, size_t length, uint8_t *dst, const uint8_t *src);
I.e., similar to cbc_encrypt, but without the arguments nettle_cipher_func *f, size_t block_size.
Regards, /Niels
On Mon, Sep 13, 2021 at 5:08 PM Niels Möller nisse@lysator.liu.se wrote:
nisse@lysator.liu.se (Niels Möller) writes:
I've also added a cbc-aes128-encrypt.asm. That gives more significant speedup, almost 60%. I think main reason for the speedup is that we avoid reloading subkeys between blocks.
I've continued this path, see branch aes-cbc. The aes128 variant is at
https://git.lysator.liu.se/nettle/nettle/-/blob/aes-cbc/x86_64/aesni/cbc-aes...
Benchmark results are positive but a bit puzzling. On my laptop (AMD Ryzen 5) I get
aes128 ECB encrypt 5450.18
This is the latest version, doing two blocks per iteration.
aes128 CBC encrypt 547.34
The general CBC mode written in C, with one call to aes128_encrypt per block. 10(!) times slower than ECB.
cbc_aes128 encrypt 865.11
The new assembly function. Almost 60% speedup over the old code, which is nice, and large enough that it seems motivated to have the new functin. But still 6 times slower than ECB. I'm not sure why. Let's look a bit closer at cycle numbers.
Not sure I get accurate cycle numbers (it's a bit tricky with variable features and turbo modes and whatnot), but it looks like ECB mode is 6 cycles per block, which would be consistent with issue of two aesenc instructions per block. While the CBC mode is 37 cycles per block, almost 4 cycles per aesenc.
This could be explained if (i) latency of aesenc is 3-4 cycles, and (ii) the processor's out-of-order machinery results in as many as 7-8 blocks processed in parallel when executing the ECB loop, i.e., instruction issue for 3-4 iterations through the loop before the results of the first iteration is ready.
I did the tests on Intel Comet Lake architecture and I can't think of another explanation, it seems x86_64 arch issues multiple blocks simultaneously without hand-written unrolling of the block loop. Also, Intel processors or at least Intel Comet Lake arch implements this machinery in a more ideal way than your testing processor (AMD Ryzen 5) so you don't even need to have 2-way interleaving of AES-ECB implementation nor a separate AES-CBC implementation. I got the same benchmark speed of ECB and CBC modes for all cases with CBC mode being always 6 times slower than ECB mode.
regards, Mamone
Maamoun TK maamoun.tk@googlemail.com writes:
only: variables: - $S390X_SSH_IP_ADDRESS - $S390X_SSH_PRIVATE_KEY - $S390X_SSH_CI_DIRECTORY
What does this mean? Ah, it excludes the job if these variables aren't set?
Yes, this is what it does according to gitlab ci docs https://docs.gitlab.com/ee/ci/yaml/#onlyexcept-basic. otherwise, fresh forks will have always-unsuccessful job.
Hmm, docs aren't quite clear, but it doesn't seem to work as is. I accidentally set the new S390X_ACCOUNT varable to "protected", and then the job was started but with $S390X_ACCOUNT expanding to the empty string, and failing.. Perhaps it needs to be written as
- $FOO != ""
instead?
Regards, /Niels
On Sat, Mar 27, 2021 at 9:37 AM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
only: variables: - $S390X_SSH_IP_ADDRESS - $S390X_SSH_PRIVATE_KEY - $S390X_SSH_CI_DIRECTORY
What does this mean? Ah, it excludes the job if these variables aren't set?
Yes, this is what it does according to gitlab ci docs https://docs.gitlab.com/ee/ci/yaml/#onlyexcept-basic. otherwise, fresh forks will have always-unsuccessful job.
Hmm, docs aren't quite clear, but it doesn't seem to work as is. I accidentally set the new S390X_ACCOUNT varable to "protected", and then the job was started but with $S390X_ACCOUNT expanding to the empty string, and failing.. Perhaps it needs to be written as
- $FOO != ""
instead?
This doc https://docs.gitlab.com/ee/ci/variables/#syntax-of-cicd-variable-expressions looks more clear, according to this doc we check variable presence correctly, it checks if the variable is defined and non-empty. I think it's some sort of bug. However, you can try $S390X_ACCOUNT != null which just checks if the variable is defined.
regards, Mamone
nettle-bugs@lists.lysator.liu.se