I've generated a release candidate, available at
http://www.lysator.liu.se/~nisse/archive/nettle-2.7rc1.tar.gz
Testing appreciated.
Regards, /Niels
On Sun, 21 Apr 2013, Niels Möller wrote:
I've generated a release candidate, available at
http://www.lysator.liu.se/~nisse/archive/nettle-2.7rc1.tar.gz
Testing appreciated.
With the patch I just sent, nettle 2.7 builds in the "tricky" cross build envs I normally build nettle in.
// Martin
Martin Storsjö martin@martin.st writes:
With the patch I just sent, nettle 2.7 builds in the "tricky" cross build envs I normally build nettle in.
Thanks!
BTW, is there any interest for some kind of release party in Stockholm? Possibly on Friday.
Regards, /Niels
On Tue, 23 Apr 2013, Niels Möller wrote:
Martin Storsjö martin@martin.st writes:
With the patch I just sent, nettle 2.7 builds in the "tricky" cross build envs I normally build nettle in.
Thanks!
BTW, is there any interest for some kind of release party in Stockholm? Possibly on Friday.
I'm not in the area, and wouldn't be able to make it this week anyway.
I built and tested the rc on native windows as well - in 32 bit mode, all tests succeed, while in 64 bit mode, quite a few of them seem to fail - the failing tests are:
salsa20 sha224 sha256 sha384 sha512 hmac umac yarrow pbkdf2 rsa dsa dsa-keygen
Haven't checked yet whether the same failures can be reproduced with wine64.
// Martin
On Tue, 23 Apr 2013, Martin Storsjö wrote:
I built and tested the rc on native windows as well - in 32 bit mode, all tests succeed, while in 64 bit mode, quite a few of them seem to fail - the failing tests are:
salsa20 sha224 sha256 sha384 sha512 hmac umac yarrow pbkdf2 rsa dsa dsa-keygen
Haven't checked yet whether the same failures can be reproduced with wine64.
The exact same tests fail with wine64 when built and run from linux as well (with gmp 5.1.1 if that's of any relevance).
// Martin
On Tue, 23 Apr 2013, Martin Storsjö wrote:
On Tue, 23 Apr 2013, Martin Storsjö wrote:
I built and tested the rc on native windows as well - in 32 bit mode, all tests succeed, while in 64 bit mode, quite a few of them seem to fail - the failing tests are:
salsa20 sha224 sha256 sha384 sha512 hmac umac yarrow pbkdf2 rsa dsa dsa-keygen
Haven't checked yet whether the same failures can be reproduced with wine64.
The exact same tests fail with wine64 when built and run from linux as well (with gmp 5.1.1 if that's of any relevance).
With the final two patches I've just sent, all of these pass, both on wine64 and on real windows.
// Martin
Martin Storsjö martin@martin.st writes:
With the final two patches I've just sent, all of these pass, both on wine64 and on real windows.
Many thanks for tracking these w64-related bugs down. Do you have a debugging environment for w64, or did you find the bugs be reading the (preprocessed) assembly code?
Looking at W64_ENTRY, do you remember why the stack allocation for saved xmm registers is done as
sub [$]eval(8 + 16*($2 - 6)), %rsp
Subtracting the extra 8 bytes seems useless; at least I don't see anything stored at that location. I wonder if maybe that was an incomplete attempt at getting a 16-byte aligned pointer for the the xmm stores?
Regards, /Niels
On Tue, 23 Apr 2013, Niels Möller wrote:
Martin Storsjö martin@martin.st writes:
With the final two patches I've just sent, all of these pass, both on wine64 and on real windows.
Many thanks for tracking these w64-related bugs down. Do you have a debugging environment for w64, or did you find the bugs be reading the (preprocessed) assembly code?
I have access to a real windows environment as well, where I used gdb to help me track them down, which really helped a lot this time. I've never taken the time to figure out how to use winedbg properly...
Looking at W64_ENTRY, do you remember why the stack allocation for saved xmm registers is done as
sub [$]eval(8 + 16*($2 - 6)), %rsp
Subtracting the extra 8 bytes seems useless; at least I don't see anything stored at that location. I wonder if maybe that was an incomplete attempt at getting a 16-byte aligned pointer for the the xmm stores?
Hmm, yes, I think that might have been the case. So since we can't rely on that being aligned anyway, we could just as well skip the 8 byte offset.
// Martin
Martin Storsjö martin@martin.st writes:
Hmm, yes, I think that might have been the case. So since we can't rely on that being aligned anyway, we could just as well skip the 8 byte offset.
If it works now, I don't think we should touch this code further before release.
For later optimization (if it really makes a difference to performance if we use aligned or unaligned loads and stores here? I don't know), one could keep the 8 byte extra allocation, then do something like
lea 8(%rsp), %r10 and $-16, %r10
(%r10 should always be free for scratch use at both entry and exit, right?). Then %r10 will be 16 byte aligned, and hold either %rsp or %rsp + 8. And we can then do fully aligned loads and stores of the xmm registers via offsets from %r10.
Regards, /Niels
On Tue, 23 Apr 2013, Niels Möller wrote:
Martin Storsjö martin@martin.st writes:
Hmm, yes, I think that might have been the case. So since we can't rely on that being aligned anyway, we could just as well skip the 8 byte offset.
If it works now, I don't think we should touch this code further before release.
Yes, that's probably wisest.
For later optimization (if it really makes a difference to performance if we use aligned or unaligned loads and stores here? I don't know), one could keep the 8 byte extra allocation, then do something like
lea 8(%rsp), %r10 and $-16, %r10
(%r10 should always be free for scratch use at both entry and exit, right?). Then %r10 will be 16 byte aligned, and hold either %rsp or %rsp
- And we can then do fully aligned loads and stores of the xmm
registers via offsets from %r10.
That would probably work. I don't know these things well enough to say whether there's any serious performance to be gained by doing this, compared to the inconvenience of wasting one register.
// Martin
nettle-bugs@lists.lysator.liu.se