Hello and best wishes for new year, I was debugging an invalid memory access in gnutls and realized that the issue is in memxor3 of nettle (2.7.x branch). I have not yet a fix for that, but the attached patch modifies the memxor-test.c to reproduce the issue.
regards, Nikos
$ valgrind ./memxor-test [...] ==26108== Invalid read of size 8 ==26108== at 0x4E817BD: memxor3 (memxor.s:137) ==26108== by 0x4029D8: test_main (memxor-test.c:157) ==26108== by 0x4024C6: main (testutils.c:204) ==26108== Address 0x5094498 is 104 bytes inside a block of size 111 alloc'd ==26108== at 0x4A0645D: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==26108== by 0x402909: test_main (memxor-test.c:152) ==26108== by 0x4024C6: main (testutils.c:204) ==26108== ==26108==
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
I was debugging an invalid memory access in gnutls and realized that the issue is in memxor3 of nettle (2.7.x branch). I have not yet a fix for that, but the attached patch modifies the memxor-test.c to reproduce the issue.
Hmm. If you have, e.g., a 14 byte block starting at address 0x1001, it usually harmless to do read that data as a two aligned reads at address 0x1000 and 0x1008, and then ignore the data outside of the area. Valgrind usually doesn't warn about that.
Now memxor on x86_64 is a bit special, since it tries to align the *writes*, but it does full-word *unaligned* reads. In this case, reading outside of the input area may cause serious problems, e.g, crossing a page boundary.
So this might be a fairly serious bug in the memxor assembly code.
void test_main(void) {
- uint8_t dst_buf[MAX_SIZE];
- uint8_t *c_buf;
- uint8_t *d_buf; const uint8_t *a = H("ecc8737f 38f2f9e8 86b9d84c 42a9c7ef" "27a50860 49c6be97 c5cc6c35 3981b367" "f8b4397b 951e3b2f 35749fe1 25884fa6"
@@ -144,4 +148,15 @@ test_main(void) for (align_b = 0; align_b < ALIGN_SIZE; align_b++) test_memxor3 (a, b, c, size[i], align_dst, align_a, align_b); }
- c_buf = malloc(111);
- d_buf = malloc(111);
- memset(c_buf, 1, 111);
- memset(d_buf, 3, 111);
- memxor3 (dst_buf+13, c_buf+96, d_buf, 15);
- ASSERT(dst_buf[14] == 2);
- free(c_buf);
- free(d_buf);
}
What result do you get with this test? Does it fail the assert, or does it exit successfuly when running without valgrind, but generate warnings/errors with valgrind?
The memxor-test.c in the repo tries to run memxor and memxor3 with all combinations of alignments and a selection of different sizes. And I get no warnings from
make check EMULATOR='$(VALGRIND)' TS_ALL=memxor-test
As far as I see, those tests *should* include the same size and alignments as in your test, so I wonder what's going on here.
Regards, /Niels
On Thu, 2013-12-26 at 22:00 +0100, Niels Möller wrote:
What result do you get with this test? Does it fail the assert, or does it exit successfuly when running without valgrind, but generate warnings/errors with valgrind?
It exits successfully but prints the valgrind warning (shown in the previous message).
The memxor-test.c in the repo tries to run memxor and memxor3 with all combinations of alignments and a selection of different sizes. And I get no warnings from
Do you get a warning after applying the patch? The alignment in the patch is exactly the same combination that causes the issue in the gnutls test cases, and seems to be dst:13, buf1:0, buf2:0.
In the memxor tests, do you put the buffers at a zone exactly at the ending position of the buffer, or there is some slack size at the end? In the latter case you'll not see any read past the boundary (though I see that you catch any write).
regards, Nikos
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
It exits successfully but prints the valgrind warning (shown in the previous message).
Then I know what to expect.
Do you get a warning after applying the patch?
Haven't had time to try it yet.
I just had a look at x86_64/memxor.asm. As far as I see, it's written to not read anything outside of the input buffers. Initial and final portions are done with loops doing one byte at a time. It's not clear how it could read beyond the input area without also clobbering bytes outside of the destination area. Some debugging needed.
In the memxor tests, do you put the buffers at a zone exactly at the ending position of the buffer,
No. That's probably why valgrind doesn't warn about it. I just allocate a buffer of 16 + MAX_SIZE bytes, and try with starting address buf, buf+1, ..., buf + 15.
Regards, /Niels
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
I was debugging an invalid memory access in gnutls and realized that the issue is in memxor3 of nettle (2.7.x branch).
I have now tried your new test case. I can reproduce the valgrind warning. But after stepping through the memxor3 function, I think it's harmless. This is what happens:
We get an memxor3 call with size 15. We have
dst area: 0x7fffffffe91d, ending before 0x7fffffffe92c src1 area: 0x6333d0, ending before 0x6333df src2 area: 0x6333f0, ending before 0x6333ff
Since the end of the destination area, 0x7fffffffe92c, is unaligned, the initial byte loop processes the last 4 bytes of the areas, and reduces the length accordingly, to 11. The remaining work stores to the destination area ending at the aligned address 0x7fffffffe928.
Next, we enter the special case that the src operands have the same alignment, but different from the destination. Pointers and shift counts are set up, and the code then makes aligned reads at
0x6333d0, 0x6333d8 (src1) 0x6333f0, 0x6333f8 (src2)
The reads at 0x6333d8 and 0x6333f8 include one byte beyond the end of the input areas, and one of them triggers a valgrind warning.
The result of these reads are xored together, then shifted and ored together, so that the last five bytes read are ignored (i.e., the byte beyond the end of input, as well as the 4 bytes already processed). And the result is then stored at the aligned address 0x7fffffffe920.
Then the code exits the main loop (actually, it never entered, the above operations were done by the loop setup for odd word count, since the main loop uses two-way unrolling).
And then there's a final byte loop, processing the first three bytes of the areas (doing some unnecessary rereads of the input bytes at addresses 0x6333d[0-2] and 0x6333f[0-2]).
My conclusions are:
1. There's no bug here.
2. We should use the --partial-loads-ok=yes valgrind option. (The manual says "Note that code that behaves in this way is in violation of the the ISO C/C++ standards, and should be considered broken.", but those standards clearly don't apply to assembly code).
3. memxor.c might also use "partial loads" in a way which violates C standards. I don't think that's a problem on any real system, and, e.g, glibc memcmp does similar tricks.
Regards, /Niels
On 01/04/2014 09:54 PM, Niels Möller wrote:
0x6333d0, 0x6333d8 (src1) 0x6333f0, 0x6333f8 (src2) The reads at 0x6333d8 and 0x6333f8 include one byte beyond the end of the input areas, and one of them triggers a valgrind warning.
[...]
- There's no bug here.
I couldn't deduce that from your description. Why are these reads legal?
- We should use the --partial-loads-ok=yes valgrind option. (The manual says "Note that code that behaves in this way is in violation of the the ISO C/C++ standards, and should be considered broken.", but those standards clearly don't apply to assembly code).
- memxor.c might also use "partial loads" in a way which violates C standards. I don't think that's a problem on any real system, and, e.g, glibc memcmp does similar tricks.
Irrespective of the C standard, why do you think that accessing this byte outside the buffer boundary is valid? I guess you rely that pages will be of an aligned size anyway?
regards, Nikos
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
I couldn't deduce that from your description. Why are these reads legal?
Because they are read at a word aligned address.
Irrespective of the C standard, why do you think that accessing this byte outside the buffer boundary is valid? I guess you rely that pages will be of an aligned size anyway?
I expect that every byte of memory which is accessible at all is accessible using an aligned read access of a full word. I view byte-sized loads in the instruction set as mostly syntactic sugar for word-sized loads and masking.
If you are aware of any problems on real systems (say, using memxor with some memory mapped i/o register area as a source), I'd like to hear the details.
I see no problem with the x86_64 assembly code. Maybe it's a problem with the supposedly portable C implementation. There's also other code depending on endianness (and WORDS_BIGENDIAN), which strictly speaking is not valid C. But I don't intend to fix such things unless it breaks things on some otherwise supported platform.
If needed, we can add a configure test and a simple and 100% kosher byte-by-byte memxor loop for affected systems.
Regards, /Niels
On Sun, Jan 5, 2014 at 11:13 PM, Niels Möller nisse@lysator.liu.se wrote:
Nikos Mavrogiannopoulos nmav@gnutls.org writes:
I couldn't deduce that from your description. Why are these reads legal?
Because they are read at a word aligned address.
Irrespective of the C standard, why do you think that accessing this byte outside the buffer boundary is valid? I guess you rely that pages will be of an aligned size anyway?
I expect that every byte of memory which is accessible at all is accessible using an aligned read access of a full word. I view byte-sized loads in the instruction set as mostly syntactic sugar for word-sized loads and masking.
I understand your argumentation, but is there some fact on that? Is that part of the x86 or (e.g. the arm) specification? Moreover should nettle install a valgrind suppression file? That way projects that are affected will not fail any valgrind tests.
regards, Nikos
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Aloha!
Nikos Mavrogiannopoulos wrote:
Moreover should nettle install a valgrind suppression file? That way projects that are affected will not fail any valgrind tests.
IMHO - No!
One can provide a Valgrind specfile, but not install it if there is any risk of affecting users running Valgrind on their applications without their knowledge.
The really important thing is to document that the Valgrind specfile exist, what it does and why. - -- Med vänlig hälsning, Yours
Joachim Strömbergson - Alltid i harmonisk svängning. ======================================================================== Joachim Strömbergson Secworks AB joachim@secworks.se ========================================================================
nisse@lysator.liu.se (Niels Möller) writes:
I expect that every byte of memory which is accessible at all is accessible using an aligned read access of a full word. I view byte-sized loads in the instruction set as mostly syntactic sugar for word-sized loads and masking.
After a bit more thinking, I still believe this should be safe on the architectures where we have assembly code. But I think it should be avoided in the supposedly portable C code. I don't think it's very urgent to fix, though.
Regards, /Niels
nettle-bugs@lists.lysator.liu.se