Hello, the attached patches have been used to successfully enable and test Intel CET support in an Intel emulator on SDV hardware.
The patches are minimally intrusive and enable to use a future countermeasure that is very useful as it makes ROP attacks very hard to carry out.
GCC already has all the needed support to create CET hardened code, however the hand-coded assembly needs to be changed to conform. Without these changes all the binaries that load nettle will otherwise have CET disabled, as it is an all-or-nothing at the binary level and missing ENDBRANCH instruction cause the program to terminate on indirect jump/call instructions.
The second patch is used to make the system happy when hardening flags are enabled in gcc, as it generates the appropriate section information that tells the linker all is good.
Unfortunately I do not have actual tests for this feature (one of the reasons why it is behind a configure flag even though it is safe to add the code on any x86 hardware) because the only real way to test this is to run on hardware or emulators that cause the segfaults on errors. But we can add a simple test later once hardware becomes available.
Finally while looking at the assembly I noticed that some functions have a PROLOGUE() defined but not an EPILOGUE() macro defined in their .asm files. It is unclear to me if this is an error or intentional so didn't touch those, it doesn't affect functionality for this patch anyway.
HTH, Simo.
Simo Sorce simo@redhat.com writes:
the attached patches have been used to successfully enable and test Intel CET support in an Intel emulator on SDV hardware.
Thanks.
GCC already has all the needed support to create CET hardened code, however the hand-coded assembly needs to be changed to conform. Without these changes all the binaries that load nettle will otherwise have CET disabled, as it is an all-or-nothing at the binary level and missing ENDBRANCH instruction cause the program to terminate on indirect jump/call instructions.
For the .note.gnu.property thing (which is per object file, right?), I think it's better to do it in the same place as ASM_MARK_NOEXEC_STACK. That's substituted in config.m4.in, and added at the end of each asm file using m4 divert.
The endbr instructions in the prologue are in the right place, as far as I can tell. And I'd be very tempted to rename the macro CET_ENDBR to COME_FROM, see https://en.wikipedia.org/wiki/COMEFROM ;-)
The second patch is used to make the system happy when hardening flags are enabled in gcc, as it generates the appropriate section information that tells the linker all is good.
I'd like to understand what's missing. Maybe we can fix it more explicitly?
Finally while looking at the assembly I noticed that some functions have a PROLOGUE() defined but not an EPILOGUE() macro defined in their .asm files. It is unclear to me if this is an error or intentional so didn't touch those, it doesn't affect functionality for this patch anyway.
I think it's an error. Except in the files in arm/fat with lines like
dnl PROLOGUE(_nettle_chacha_core) picked up by configure
I find three offending files, using
$ diff <(git grep -c '^ *EPILOGUE') <(git grep -c '^ *PROLOGUE') 49c49 < x86_64/poly1305-internal.asm:2 ---
x86_64/poly1305-internal.asm:3
51a52,53
x86_64/serpent-decrypt.asm:1 x86_64/serpent-encrypt.asm:1
have you seen any others?
Some minor comments below.
From de1b9bfeb4f8ad9a6bf8608c4b8c727dba315982 Mon Sep 17 00:00:00 2001 From: Simo Sorce simo@redhat.com Date: Tue, 23 Apr 2019 18:03:35 -0400 Subject: [PATCH 1/2] Add Intel CET protection support
In upcoming processors Intel will make available Control-Flow Enforcement Technology, which is comprised of two hardware countermeasures against ROP based attacks.
Please spell out ROP. It would be good with a link to further information in some reasonable place in the code.
The first is called Shadow Stack and checks that return from function calls are not tampered with by keeping a shadow stack that cannot be modified by aplications. This measure requires no code changes (except for code that intentionally modifes the return pointer on the stack).
Fix spelling of "aplication", "modifes"
The second is called Indirect Branch Tracking and is used to insure only targets of indirect jumps are actually jumped to. This requires modification of code to insert a special instruction that identifies a valid indirect jump target. When enforcement is turned on, if an indirect jump does not end on this special instruction the cpu raises an exception. These instructions are noops on older CPU models so it is safe to use them in all x86(_64) code.
To enable these protections gcc also inroduces a new GNU property note
and "inroduces"
+dnl GNU properties section to enable CET protections +define(<GNU_CET_SECTION>, +<ifelse(CET_PROTECTION,yes, +<.pushsection .note.gnu.property,"a"
As I said, prefer to dot his globally with an m4 divert in config.m4.in.
+ALIGN(8) +.long 1f - 0f +.long 4f - 1f +.long 5 +0: +.string "GNU" +1: +ALIGN(8) +.long 0xc0000002 +.long 3f - 2f +2: +.long 0x03 +3: +ALIGN(8) +4:
Are there no symbolic names for this note? Since we require assembler to suport endbr instructions, can we require that it know about these notes as well? What does it look like in gcc output?
diff --git a/x86_64/aes-decrypt-internal.asm b/x86_64/aes-decrypt-internal.asm index 43f2f394..5db39868 100644 --- a/x86_64/aes-decrypt-internal.asm +++ b/x86_64/aes-decrypt-internal.asm @@ -62,7 +62,7 @@ C work. define(<TMP>,<%rbp>)
.file "aes-decrypt-internal.asm"
- C _aes_decrypt(unsigned rounds, const uint32_t *keys, C const struct aes_table *T, C size_t length, uint8_t *dst,
@@ -70,6 +70,7 @@ define(<TMP>,<%rbp>) .text ALIGN(16) PROLOGUE(_nettle_aes_decrypt)
Please trim unrelated whitespace changes from the patch (I known it's not 100% consistent, but if we ever want to do something like M-x whitespace-cleanup on all files, that should be in a separate commit. On new changes, I usually run git diff --check to catch odd whitespace use).
Regards, /Niels
On Fri, 2019-04-26 at 10:45 +0200, Niels Möller wrote:
Simo Sorce simo@redhat.com writes:
the attached patches have been used to successfully enable and test Intel CET support in an Intel emulator on SDV hardware.
Thanks.
GCC already has all the needed support to create CET hardened code, however the hand-coded assembly needs to be changed to conform. Without these changes all the binaries that load nettle will otherwise have CET disabled, as it is an all-or-nothing at the binary level and missing ENDBRANCH instruction cause the program to terminate on indirect jump/call instructions.
For the .note.gnu.property thing (which is per object file, right?), I think it's better to do it in the same place as ASM_MARK_NOEXEC_STACK. That's substituted in config.m4.in, and added at the end of each asm file using m4 divert.
So the reason I did not do that an instead explicitly added a statement in the individual .asm files is that you cannot just add the note on any random file. The note states that the file uses CET instructions for all indirect jump targets. So, it just so happens now that no indirect jumps (except for the target of the initial call into the functions) are used in our handcrafted assembly, but that is not a given in the future.
So want I mean to say is that I took an explicit macro to mean the file was actually analyzed and made CET compliant.
If we embed the macro magically instead this "inspection" part will be missing and we may end up marking assembly files that do nbot in fact comply with CET and will make the program segfault (that's how the exception is surfaced) when run.
I understand this is not a high bar, and I will fold the segment note in if you think that is what we should do, but I wanted to make you aware of why I did not do the same as what we do with the stack note.
The endbr instructions in the prologue are in the right place, as far as I can tell. And I'd be very tempted to rename the macro CET_ENDBR to COME_FROM, see https://en.wikipedia.org/wiki/COMEFROM ;-)
That is funny, I actually used CET_ENDBR to make it easier to find for others grepping as binutils also uses a _CET_ENDBR macro, it sounded more consistent to use something with both "CET" and ENDBR in it, easier to find for someone looking and figuring out hwat it is about.
The second patch is used to make the system happy when hardening flags are enabled in gcc, as it generates the appropriate section information that tells the linker all is good.
I'd like to understand what's missing. Maybe we can fix it more explicitly?
I do not think we can easily fix it manually, it is mostly other section notes that the gcc compiler adds when it fortifies C code. But those notes do not really apply to handcrafted assembly. So this flag is basically just saying to the compiler that it should add whatever is appropriate (which may change depending on compiler flags) because our code is good as is. It silence some checking tools mostly, so I do not think it is worth wasting too much time trying to figure out what notes should be added, the compiler is happy to do it for us.
Finally while looking at the assembly I noticed that some functions have a PROLOGUE() defined but not an EPILOGUE() macro defined in their .asm files. It is unclear to me if this is an error or intentional so didn't touch those, it doesn't affect functionality for this patch anyway.
I think it's an error. Except in the files in arm/fat with lines like
dnl PROLOGUE(_nettle_chacha_core) picked up by configure
I find three offending files, using
$ diff <(git grep -c '^ *EPILOGUE') <(git grep -c '^ *PROLOGUE') 49c49
< x86_64/poly1305-internal.asm:2
x86_64/poly1305-internal.asm:3
51a52,53
x86_64/serpent-decrypt.asm:1 x86_64/serpent-encrypt.asm:1
have you seen any others?
No, those are pretty much the places where I noticed it. Would you want an additional patch that adds those EPILOGUES ?
Some minor comments below.
From de1b9bfeb4f8ad9a6bf8608c4b8c727dba315982 Mon Sep 17 00:00:00 2001 From: Simo Sorce simo@redhat.com Date: Tue, 23 Apr 2019 18:03:35 -0400 Subject: [PATCH 1/2] Add Intel CET protection support
In upcoming processors Intel will make available Control-Flow Enforcement Technology, which is comprised of two hardware countermeasures against ROP based attacks.
Please spell out ROP. It would be good with a link to further information in some reasonable place in the code.
Should I add a link in asm.m4 to this ?
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-en... (I do no have a better link)
The first is called Shadow Stack and checks that return from function calls are not tampered with by keeping a shadow stack that cannot be modified by aplications. This measure requires no code changes (except for code that intentionally modifes the return pointer on the stack).
Fix spelling of "aplication", "modifes"
The second is called Indirect Branch Tracking and is used to insure only targets of indirect jumps are actually jumped to. This requires modification of code to insert a special instruction that identifies a valid indirect jump target. When enforcement is turned on, if an indirect jump does not end on this special instruction the cpu raises an exception. These instructions are noops on older CPU models so it is safe to use them in all x86(_64) code.
To enable these protections gcc also inroduces a new GNU property note
and "inroduces"
+dnl GNU properties section to enable CET protections +define(<GNU_CET_SECTION>, +<ifelse(CET_PROTECTION,yes, +<.pushsection .note.gnu.property,"a"
As I said, prefer to dot his globally with an m4 divert in config.m4.in.
See above explanation and let me know if that changes your opinion, otherwise I will do this.
+ALIGN(8) +.long 1f - 0f +.long 4f - 1f +.long 5 +0: +.string "GNU" +1: +ALIGN(8) +.long 0xc0000002 +.long 3f - 2f +2: +.long 0x03 +3: +ALIGN(8) +4:
Are there no symbolic names for this note? Since we require assembler to suport endbr instructions, can we require that it know about these notes as well? What does it look like in gcc output?
There are symbolic names to compose the 0x03 value and for the 0xc0000002 values, the rest are just label arithmetic.
I will change in next submission.
diff --git a/x86_64/aes-decrypt-internal.asm b/x86_64/aes-decrypt-internal.asm index 43f2f394..5db39868 100644 --- a/x86_64/aes-decrypt-internal.asm +++ b/x86_64/aes-decrypt-internal.asm @@ -62,7 +62,7 @@ C work. define(<TMP>,<%rbp>)
.file "aes-decrypt-internal.asm"
- C _aes_decrypt(unsigned rounds, const uint32_t *keys, C const struct aes_table *T, C size_t length, uint8_t *dst,
@@ -70,6 +70,7 @@ define(<TMP>,<%rbp>) .text ALIGN(16) PROLOGUE(_nettle_aes_decrypt)
Please trim unrelated whitespace changes from the patch (I known it's not 100% consistent, but if we ever want to do something like M-x whitespace-cleanup on all files, that should be in a separate commit. On new changes, I usually run git diff --check to catch odd whitespace use).
Ok, this was not intentional, I will remove undesired changes.
Simo.
Ok attached find new patches, they address all concerns except for adding the CET_SECTION macro automagically to all asm files. I also added a patch to deal with the missing epilogues.
Simo.
Simo Sorce simo@redhat.com writes:
they address all concerns except for adding the CET_SECTION macro automagically to all asm files.
Thanks. I commented on that issue in my other mail.
I also added a patch to deal with the missing epilogues.
Applied now.
BTW are there any git experts here? I often apply complete patches by running cd hack/nettle/ && git am directly on the text/x-patch attachment (the | command on the attachment in Gnus' *Article* buffer). But all of git am, git apply, git am -3, git apply -3 failed on this Patch #3, most likely because line numbers depended on the earlier patches in the series, which I didn't want to apply right away. Old-fashioned patch -p1 could apply the patch, with "fuzz 2".
So are there any other options to make the git patching tools a bit more tolerant or fuzzy?
Regards, /Niels
Simo Sorce simo@redhat.com writes:
Ok attached find new patches, they address all concerns except for adding the CET_SECTION macro automagically to all asm files.
Ah, one more thing:
+define(<GNU_CET_SECTION>, +<ifelse(CET_PROTECTION,yes, +<.pushsection .note.gnu.property,"a"
How portable is .pushsection? If we ensure that notes are last, plain .section should be enough, I think.
--- a/x86_64/sha3-permute.asm +++ b/x86_64/sha3-permute.asm @@ -107,6 +107,7 @@ define(<ROTL64>, <
C sha3_permute(struct sha3_state *ctx) .text +GNU_CET_SECTION() ALIGN(16) PROLOGUE(nettle_sha3_permute) W64_ENTRY(1, 16)
This placement between .text and the prologue depends on .pushsection / .popsection. I think it should be moved last, just like in the other files, either explicitly or by means of a divert in some of the included m4 files.
Regards, /Niels
Oh sorry I did not see this email and the previous before sending my new patches.
About git, generally git am wants to be applied in order, but I think there may be a fuzz option to git too, never really investigated as I usually apply all patches to a branch (or pull that branch directly from PR branches) and then git cherry-pick if I want a specific patch on master.
On Sat, 2019-04-27 at 09:29 +0200, Niels Möller wrote:
Simo Sorce simo@redhat.com writes:
Ok attached find new patches, they address all concerns except for adding the CET_SECTION macro automagically to all asm files.
Ah, one more thing:
+define(<GNU_CET_SECTION>, +<ifelse(CET_PROTECTION,yes, +<.pushsection .note.gnu.property,"a"
How portable is .pushsection? If we ensure that notes are last, plain .section should be enough, I think.
No it needs to be .pushsection, when I was using just .section the alignment of the property was incorrect (16 instead of 8) and glibc gurus told me to use .pushsection to properly deal with that. If you have/want CET you also have a modern enough GCC and Assembler that support -fcf-protection and .pushsection so I do not think it is a problem, if you are not using NGU As, you won't be using --enable-cet- protection either
--- a/x86_64/sha3-permute.asm +++ b/x86_64/sha3-permute.asm @@ -107,6 +107,7 @@ define(<ROTL64>, <
C sha3_permute(struct sha3_state *ctx) .text +GNU_CET_SECTION() ALIGN(16) PROLOGUE(nettle_sha3_permute) W64_ENTRY(1, 16)
This placement between .text and the prologue depends on .pushsection / .popsection. I think it should be moved last, just like in the other files, either explicitly or by means of a divert in some of the included m4 files.
Yes I had already caught this mistake, it is fixed in the patch series I sent a few minutes ago.
Simo Sorce simo@redhat.com writes:
I understand this is not a high bar, and I will fold the segment note in if you think that is what we should do, but I wanted to make you aware of why I did not do the same as what we do with the stack note.
I think we should aim to make all files "cet-compliant" if we do it att all. After all, the common case is to have a libnettle.so, and then any single object file missing the annotation will make the linker disable the feature, if I've understood it correctly. I agree it should be disabled by default until we're more confident in test coverage.
BTW, do you know how that works with late loading using dlopen? One could have a process running in CET-mode, which dynamically loads an so-file with code lacking endbr instructions and corresponding annotation.
If we think about it as an arch-specific thing, which I guess we should, then maybe the m4 divert should be in x86_64/machine.m4 and x86/machine.m4, not asm.m4.
That is funny, I actually used CET_ENDBR to make it easier to find for others grepping as binutils also uses a _CET_ENDBR macro, it sounded more consistent
I agree your name is better for readability, even if less amusing.
I'd like to understand what's missing. Maybe we can fix it more explicitly?
I do not think we can easily fix it manually, it is mostly other section notes that the gcc compiler adds when it fortifies C code. But those notes do not really apply to handcrafted assembly.
[...]
So this flag is basically just saying to the compiler that it should add whatever is appropriate (which may change depending on compiler flags) because our code is good as is.
Since the command line flag is passed with -Wa, it tells the *assembler* to add notes. Which ones? Is it based only on the command line, say, $(CFLAGS) contains -fharden-foo makes the assembler produces a "foo" note. Or is it based on what's actually in the assembly input file?
Is there a risk that it automatically generates a note promising something about the assembly code which we don't actually fulfill?
Is there any documentation? I see that it is mentioned in the binutils-2.31 release announcement, but I've not found it mentioned in the Gas manual.
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-en... (I do no have a better link)
Looks like reasonable doc. (Closest on wikpedia seems to be https://en.wikipedia.org/wiki/Control-flow_integrity).
See above explanation and let me know if that changes your opinion, otherwise I will do this.
+ALIGN(8) +.long 1f - 0f +.long 4f - 1f +.long 5 +0: +.string "GNU" +1: +ALIGN(8) +.long 0xc0000002 +.long 3f - 2f +2: +.long 0x03 +3: +ALIGN(8) +4:
Are there no symbolic names for this note? Since we require assembler to suport endbr instructions, can we require that it know about these notes as well? What does it look like in gcc output?
There are symbolic names to compose the 0x03 value and for the 0xc0000002 values, the rest are just label arithmetic.
I will change in next submission.
I see, I was hoping for something more similar to
.section .note.GNU-stack,"",TYPE_PROGBITS
I'm still curious as to what it looks like in gcc output.
Regards, /Niels
On Fri, 2019-04-26 at 22:31 +0200, Niels Möller wrote:
Simo Sorce simo@redhat.com writes:
I understand this is not a high bar, and I will fold the segment note in if you think that is what we should do, but I wanted to make you aware of why I did not do the same as what we do with the stack note.
I think we should aim to make all files "cet-compliant" if we do it att all. After all, the common case is to have a libnettle.so, and then any single object file missing the annotation will make the linker disable the feature, if I've understood it correctly. I agree it should be disabled by default until we're more confident in test coverage.
Yes we should have all files CET compliant, the idea of having a distinct macro was to make it easy to catch submissions of new files that lack it.
BTW, do you know how that works with late loading using dlopen? One could have a process running in CET-mode, which dynamically loads an so-file with code lacking endbr instructions and corresponding annotation.
Yes I had the same question so I asked to our glibc gurus. The only protection that is problematic is IBT as it requires insertion of new instructions, so there is a mode where you can mark specific pages so that the CPU will ignore missing endbr instruction exclusively for executable code on those pages. So you can load an entire library via dlopen() on specially marked pages and code running on those will have no protection while the rest of the code will do. It is an all-or nothing at the library level at that point. It is not clear to me if this is fully supported today in glibc yet.
I guess in theory this could be done for individual libraries brought in at execution time by the dynamic linker, but I think that is not done and instead the whole binary is either enabled or disabled currently.
If we think about it as an arch-specific thing, which I guess we should, then maybe the m4 divert should be in x86_64/machine.m4 and x86/machine.m4, not asm.m4.
Makes sense, I can move it, do you still want it to be automagically added to all assembly files?
That is funny, I actually used CET_ENDBR to make it easier to find for others grepping as binutils also uses a _CET_ENDBR macro, it sounded more consistent
I agree your name is better for readability, even if less amusing.
I'd like to understand what's missing. Maybe we can fix it more explicitly?
I do not think we can easily fix it manually, it is mostly other section notes that the gcc compiler adds when it fortifies C code. But those notes do not really apply to handcrafted assembly.
[...]
So this flag is basically just saying to the compiler that it should add whatever is appropriate (which may change depending on compiler flags) because our code is good as is.
Since the command line flag is passed with -Wa, it tells the *assembler* to add notes.
True.
Which ones? Is it based only on the command line, say, $(CFLAGS) contains -fharden-foo makes the assembler produces a "foo" note. Or is it based on what's actually in the assembly input file?
It is not based on what is in the assembly input file afaik. It generates "GNU Build Attribute" notes.
readelf show a single additional Owner attribute named GA$<version>3a1 with Data size always set to 0x10 and applied to each section of the library corresponding to assembly files that cover the memory range occupied by said library.
I have no more information than that at the moment, but I can ask if you want me to dig through it.
Is there a risk that it automatically generates a note promising something about the assembly code which we don't actually fulfill?
I was told that's not the case, I think it just sets bare notes that basically do not assert anything specific, it just makes the tools that check for those notes happy.
Is there any documentation? I see that it is mentioned in the binutils-2.31 release announcement, but I've not found it mentioned in the Gas manual.
Couldn't find anything either. Only some fedora wiki change proposal page that mentions them.
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-en... (I do no have a better link)
Looks like reasonable doc. (Closest on wikpedia seems to be https://en.wikipedia.org/wiki/Control-flow_integrity).
Yeah I found this page too on wikipedia but is is it too generic (it also referenced the link above)
See above explanation and let me know if that changes your opinion, otherwise I will do this.
+ALIGN(8) +.long 1f - 0f +.long 4f - 1f +.long 5 +0: +.string "GNU" +1: +ALIGN(8) +.long 0xc0000002 +.long 3f - 2f +2: +.long 0x03 +3: +ALIGN(8) +4:
Are there no symbolic names for this note? Since we require assembler to suport endbr instructions, can we require that it know about these notes as well? What does it look like in gcc output?
There are symbolic names to compose the 0x03 value and for the 0xc0000002 values, the rest are just label arithmetic.
I will change in next submission.
I see, I was hoping for something more similar to
.section .note.GNU-stack,"",TYPE_PROGBITS
Nope, no such thing.
I'm still curious as to what it looks like in gcc output.
Exactly like the above.
You can see it appended in any .s file generated by gcc when you compile with CFLAGS including -fcf-protection
Attached find a new patch that replaces the first one and moves the section definition to be machine specific
The second is a WIP, to also move the new instructions and make the section atumagic. It makes the configure.ac clearer but requires to add macros to each machine.m4 file. Given it is a amchine specific technology I did not think it made sense to add CET_ENDBR name in non intel architectures, and allows us to use CODEFROM :-)
The other patches are unchanged, but I changed the order to put them first so they can be applied right away if you want.
nettle-bugs@lists.lysator.liu.se