I made a merge request !26 https://git.lysator.liu.se/nettle/nettle/-/merge_requests/26 that optimizes the GHASH algorithm for S390x architecture. I've attached a benchmark in the description of merge request that describes the improvement of using GHASH accelerator over C implementation. I've also made two patches for fat build support of AES and GHASH for S390x architecture in addition to optimize memxor function using 'xc (xor storage-to-storage) instruction' Files · s390x-fat · Maamoun TK / nettle · GitLab (liu.se) https://git.lysator.liu.se/mamonet/nettle/-/tree/s390x-fat Files · s390x-memxor · Maamoun TK / nettle · GitLab (liu.se) https://git.lysator.liu.se/mamonet/nettle/-/tree/s390x-memxor I'll make merge requests for both patches after the current one being merged since they need to rebase on top of that patch.
regards, Mamone
Maamoun TK maamoun.tk@googlemail.com writes:
I made a merge request !26 https://git.lysator.liu.se/nettle/nettle/-/merge_requests/26 that optimizes the GHASH algorithm for S390x architecture.
Nice! I've added a few comments in the mr.
Regards, /Niels
I've replied to your comments in the MR.
Thank you, Mamone
On Wed, Jun 30, 2021 at 10:10 PM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
I made a merge request !26 https://git.lysator.liu.se/nettle/nettle/-/merge_requests/26 that optimizes the GHASH algorithm for S390x architecture.
Nice! I've added a few comments in the mr.
Regards, /Niels
-- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance.
I've added a new comment that wipes hash subkey from stack once GHASH operation completed as it's a good practice to do so, I also added a disassembly snippet in the comment section that proves the need of reserving 160 bytes before committing a dynamic stack allocation.
regards, Mamone
On Thu, Jul 1, 2021 at 4:43 PM Maamoun TK maamoun.tk@googlemail.com wrote:
I've replied to your comments in the MR.
Thank you, Mamone
On Wed, Jun 30, 2021 at 10:10 PM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
I made a merge request !26 https://git.lysator.liu.se/nettle/nettle/-/merge_requests/26 that optimizes the GHASH algorithm for S390x architecture.
Nice! I've added a few comments in the mr.
Regards, /Niels
-- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance.
On Fri, Jul 2, 2021 at 11:59 PM Maamoun TK maamoun.tk@googlemail.com wrote:
I've added a new comment that wipes hash subkey from stack once GHASH operation completed as it's a good practice to do so
*commit
I'm thinking it's also worth it to wipe the authentication tag and the leftover bytes of input data from the stack. Leaving out the output authentication tag in the stack is never a good idea and in case of processing AAD the input data is left in the clear so leaving leftover bytes in the stack may reveal potential secret data. I've pushed another commit to wipe the whole parameter block content (authentication tag and hash subkey) and the leftover bytes of input data.
regards, Mamone
On Thu, Jul 1, 2021 at 4:43 PM Maamoun TK maamoun.tk@googlemail.com wrote:
I've replied to your comments in the MR.
Thank you, Mamone
On Wed, Jun 30, 2021 at 10:10 PM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
I made a merge request !26 https://git.lysator.liu.se/nettle/nettle/-/merge_requests/26 that optimizes the GHASH algorithm for S390x architecture.
Nice! I've added a few comments in the mr.
Regards, /Niels
-- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance.
Hi Niels,
Any update on this patch? I think we have reached the merging stage of this patch if there are no further queries.
regards, Mamone
On Sat, Jul 3, 2021 at 2:48 AM Maamoun TK maamoun.tk@googlemail.com wrote:
On Fri, Jul 2, 2021 at 11:59 PM Maamoun TK maamoun.tk@googlemail.com wrote:
I've added a new comment that wipes hash subkey from stack once GHASH operation completed as it's a good practice to do so
*commit
I'm thinking it's also worth it to wipe the authentication tag and the leftover bytes of input data from the stack. Leaving out the output authentication tag in the stack is never a good idea and in case of processing AAD the input data is left in the clear so leaving leftover bytes in the stack may reveal potential secret data. I've pushed another commit to wipe the whole parameter block content (authentication tag and hash subkey) and the leftover bytes of input data.
regards, Mamone
On Thu, Jul 1, 2021 at 4:43 PM Maamoun TK maamoun.tk@googlemail.com wrote:
I've replied to your comments in the MR.
Thank you, Mamone
On Wed, Jun 30, 2021 at 10:10 PM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
I made a merge request !26 https://git.lysator.liu.se/nettle/nettle/-/merge_requests/26 that optimizes the GHASH algorithm for S390x architecture.
Nice! I've added a few comments in the mr.
Regards, /Niels
-- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance.
Maamoun TK maamoun.tk@googlemail.com writes:
Any update on this patch? I think we have reached the merging stage of this patch if there are no further queries.
Merged, thanks!
I'm thinking it's also worth it to wipe the authentication tag and the leftover bytes of input data from the stack. Leaving out the output authentication tag in the stack is never a good idea and in case of processing AAD the input data is left in the clear so leaving leftover bytes in the stack may reveal potential secret data. I've pushed another commit to wipe the whole parameter block content (authentication tag and hash subkey) and the leftover bytes of input data.
Other nettle functions don't do that, it's generally assumed that the running program is trustworthy, and that the operating system protects the data from non-trustworthy processes. I think using encrypted swap (using an ephemeral key destroyed on shutdown) is a good idea.
To me, it makes some sense for nettle to wipe the copy of the key (since the application might wipe the context struct and expect no copies to remain), but probably overkill for the other data. But it shouldn't hurt either.
Regards, /Niels
On Thu, Jul 8, 2021 at 11:43 PM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
I'm thinking it's also worth it to wipe the authentication tag and the leftover bytes of input data from the stack. Leaving out the output authentication tag in the stack is never a good idea and in case of processing AAD the input data is left in the clear so leaving leftover bytes in the stack may reveal potential secret data. I've pushed another commit to wipe the whole parameter block content (authentication tag and hash subkey) and the leftover bytes of input data.
Other nettle functions don't do that, it's generally assumed that the running program is trustworthy, and that the operating system protects the data from non-trustworthy processes. I think using encrypted swap (using an ephemeral key destroyed on shutdown) is a good idea.
To me, it makes some sense for nettle to wipe the copy of the key (since the application might wipe the context struct and expect no copies to remain), but probably overkill for the other data. But it shouldn't hurt either.
S390x's GHASH implementation needs to copy the key and input tail data to the stack, I just instructed the function to wipe that data from the stack once the cipher operation is completed, I don't do any kind of data wiping from the input buffer or cipher context. My concern is if the program terminates then the operation system will deallocate the program's stack without clearing its content so that leftover data will remain somewhere at the RAM which could be a subject for a memory allocation or dumbing by other programs.
regards, Mamone
Maamoun TK maamoun.tk@googlemail.com writes:
My concern is if the program terminates then the operation system will deallocate the program's stack without clearing its content so that leftover data will remain somewhere at the RAM which could be a subject for a memory allocation or dumbing by other programs.
I think the kernel is responsible for clearing that memory before handing it out to a new process. If it didn't, that would be a huge security problem. I'm fairly sure operating systems do this correctly. (And I would be a bit curious to know of any exceptions, maybe some embedded or ancient systems don't do it?)
Regards, /Niels
On Fri, Jul 9, 2021 at 10:08 AM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
My concern is if the program terminates then the operation system will deallocate the program's stack without clearing its content so that leftover data will remain somewhere
at
the RAM which could be a subject for a memory allocation or dumbing by other programs.
I think the kernel is responsible for clearing that memory before handing it out to a new process. If it didn't, that would be a huge security problem. I'm fairly sure operating systems do this correctly. (And I would be a bit curious to know of any exceptions, maybe some embedded or ancient systems don't do it?)
You are right, modern operating systems are supposed to have this functionality but accessing some program's memory is pretty easy nowadays, I think it's a good practice to clean behind the cipher functions for what it makes sense and whenever possible.
In another topic, I've optimized the SHA-512 algorithm for arm64 architecture but it turned out all CFarm variants don't support SHA-512 crypto extension so I can't do any performance or correctness testing for now. Do you know any CFarm alternative that supports SHA-512 and SHA3 extensions for arm64 architectures?
regards, Mamone
On Sat, Jul 10, 2021 at 2:45 PM Maamoun TK maamoun.tk@googlemail.com wrote:
On Fri, Jul 9, 2021 at 10:08 AM Niels Möller nisse@lysator.liu.se wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
My concern is if the program terminates then the operation system will deallocate the program's stack without clearing its content so that leftover data will remain somewhere
at
the RAM which could be a subject for a memory allocation or dumbing by other programs.
I think the kernel is responsible for clearing that memory before handing it out to a new process. If it didn't, that would be a huge security problem. I'm fairly sure operating systems do this correctly. (And I would be a bit curious to know of any exceptions, maybe some embedded or ancient systems don't do it?)
You are right, modern operating systems are supposed to have this functionality but accessing some program's memory is pretty easy nowadays, I think it's a good practice to clean behind the cipher functions for what it makes sense and whenever possible.
In another topic, I've optimized the SHA-512 algorithm for arm64 architecture but it turned out all CFarm variants don't support SHA-512 crypto extension so I can't do any performance or correctness testing for now. Do you know any CFarm alternative that supports SHA-512 and SHA3 extensions for arm64 architectures?
There is a new AArch64 system in the GCC Compile Farm that has not been installed yet. That system might provide the SHA-512 support. It will have an Ampere eMAG processor supporting ARMv8.
Thanks, David
On Sat, Jul 10, 2021 at 9:55 PM David Edelsohn dje.gcc@gmail.com wrote:
On Sat, Jul 10, 2021 at 2:45 PM Maamoun TK maamoun.tk@googlemail.com wrote:
On Fri, Jul 9, 2021 at 10:08 AM Niels Möller nisse@lysator.liu.se
wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
My concern is if the program terminates then the operation system will deallocate the program's
stack
without clearing its content so that leftover data will remain
somewhere
at
the RAM which could be a subject for a memory allocation or dumbing
by
other programs.
I think the kernel is responsible for clearing that memory before handing it out to a new process. If it didn't, that would be a huge security problem. I'm fairly sure operating systems do this correctly. (And I would be a bit curious to know of any exceptions, maybe some embedded or ancient systems don't do it?)
You are right, modern operating systems are supposed to have this functionality but accessing some program's memory is pretty easy
nowadays,
I think it's a good practice to clean behind the cipher functions for
what
it makes sense and whenever possible.
In another topic, I've optimized the SHA-512 algorithm for arm64 architecture but it turned out all CFarm variants don't support SHA-512 crypto extension so I can't do any performance or correctness testing for now. Do you know any CFarm alternative that supports SHA-512 and SHA3 extensions for arm64 architectures?
There is a new AArch64 system in the GCC Compile Farm that has not been installed yet. That system might provide the SHA-512 support. It will have an Ampere eMAG processor supporting ARMv8.
Thanks, David
On Sat, Jul 10, 2021 at 9:55 PM David Edelsohn dje.gcc@gmail.com wrote:
On Sat, Jul 10, 2021 at 2:45 PM Maamoun TK maamoun.tk@googlemail.com wrote:
On Fri, Jul 9, 2021 at 10:08 AM Niels Möller nisse@lysator.liu.se
wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
My concern is if the program terminates then the operation system will deallocate the program's
stack
without clearing its content so that leftover data will remain
somewhere
at
the RAM which could be a subject for a memory allocation or dumbing
by
other programs.
I think the kernel is responsible for clearing that memory before handing it out to a new process. If it didn't, that would be a huge security problem. I'm fairly sure operating systems do this correctly. (And I would be a bit curious to know of any exceptions, maybe some embedded or ancient systems don't do it?)
You are right, modern operating systems are supposed to have this functionality but accessing some program's memory is pretty easy
nowadays,
I think it's a good practice to clean behind the cipher functions for
what
it makes sense and whenever possible.
In another topic, I've optimized the SHA-512 algorithm for arm64 architecture but it turned out all CFarm variants don't support SHA-512 crypto extension so I can't do any performance or correctness testing for now. Do you know any CFarm alternative that supports SHA-512 and SHA3 extensions for arm64 architectures?
There is a new AArch64 system in the GCC Compile Farm that has not been installed yet. That system might provide the SHA-512 support. It will have an Ampere eMAG processor supporting ARMv8.
Thanks for the info, I think we have to wait until the new system is set up.
regards, Mamone
On Sat, Jul 10, 2021 at 10:08 PM Maamoun TK maamoun.tk@googlemail.com wrote:
On Sat, Jul 10, 2021 at 9:55 PM David Edelsohn dje.gcc@gmail.com wrote:
On Sat, Jul 10, 2021 at 2:45 PM Maamoun TK maamoun.tk@googlemail.com wrote:
On Fri, Jul 9, 2021 at 10:08 AM Niels Möller nisse@lysator.liu.se
wrote:
Maamoun TK maamoun.tk@googlemail.com writes:
My concern is if the program terminates then the operation system will deallocate the program's
stack
without clearing its content so that leftover data will remain
somewhere
at
the RAM which could be a subject for a memory allocation or dumbing
by
other programs.
I think the kernel is responsible for clearing that memory before handing it out to a new process. If it didn't, that would be a huge security problem. I'm fairly sure operating systems do this correctly. (And I would be a bit curious to know of any exceptions, maybe some embedded or ancient systems don't do it?)
You are right, modern operating systems are supposed to have this functionality but accessing some program's memory is pretty easy
nowadays,
I think it's a good practice to clean behind the cipher functions for
what
it makes sense and whenever possible.
In another topic, I've optimized the SHA-512 algorithm for arm64 architecture but it turned out all CFarm variants don't support SHA-512 crypto extension so I can't do any performance or correctness testing
for
now. Do you know any CFarm alternative that supports SHA-512 and SHA3 extensions for arm64 architectures?
There is a new AArch64 system in the GCC Compile Farm that has not been installed yet. That system might provide the SHA-512 support. It will have an Ampere eMAG processor supporting ARMv8.
Thanks for the info, I think we have to wait until the new system is set up.
To clarify, the new Aarch64 machine in GCC Compile Farm doesn't support SHA-512 and SHA3 extensions.
Until we figure a way to test the optimized cores of SHA-512 and SHA, we can proceed with the optimized implementation of AES for Arm64 then I'll implement optimizations for Chacha20 and Poly1305 which is planned to be released with corresponding optimizations for S390x architecture by using the supported vector facility.
There are also two patches of fat build support and memxor optimization for s390x, it would be great to process them so I can start pushing patches of SHA optimizations for s390x architecture.
regards, Mamone
Maamoun TK maamoun.tk@googlemail.com writes:
You are right, modern operating systems are supposed to have this functionality but accessing some program's memory is pretty easy nowadays, I think it's a good practice to clean behind the cipher functions for what it makes sense and whenever possible.
I think it's futile to try to do that thoroughly, e.g., code generated by the compiler will not clear each stack frame on return (and I'm not even ware of any compiler option to generate code like that). We have to trust the operating system (where as usual, "trust" can also be read as "depend on").
For the specific case of key material, it might make sense to go to a little extra effort to not leave copies in memory, but other neetle code doesn't do that.
In another topic, I've optimized the SHA-512 algorithm for arm64 architecture but it turned out all CFarm variants don't support SHA-512 crypto extension so I can't do any performance or correctness testing for now. Do you know any CFarm alternative that supports SHA-512 and SHA3 extensions for arm64 architectures?
Can you do correctness tests on qemu? (I've been using a crosscompiler and qemu-user to test other ARM code, and that's also what the ci tests do).
I have access to the systems listed on https://gmplib.org/devel/testsystems, is any of those applicable? The arm64 machines available includes one Cortex-A73 and one Apple M1.
Regards, /Niels
nettle-bugs@lists.lysator.liu.se