nettle-2.7 release candidate

List overview All Threads
Download

newer

older

ANNOUNCE: Nettle-2.7

[PATCH 1/2] sha: Add the missing...

nisse＠lysator.liu.se

21 Apr 2013 21 Apr '13

7:50 p.m.

I've generated a release candidate, available at

http://www.lysator.liu.se/~nisse/archive/nettle-2.7rc1.tar.gz

Testing appreciated.

Regards, /Niels

-- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance.

Show replies by date

Martin Storsjö

23 Apr 23 Apr

11:23 a.m.

On Sun, 21 Apr 2013, Niels Möller wrote:

...

I've generated a release candidate, available at

http://www.lysator.liu.se/~nisse/archive/nettle-2.7rc1.tar.gz

Testing appreciated.

With the patch I just sent, nettle 2.7 builds in the "tricky" cross build envs I normally build nettle in.

// Martin

nisse＠lysator.liu.se

11:34 a.m.

Martin Storsjö martin@martin.st writes:

...

With the patch I just sent, nettle 2.7 builds in the "tricky" cross build envs I normally build nettle in.

Thanks!

BTW, is there any interest for some kind of release party in Stockholm? Possibly on Friday.

Regards, /Niels

-- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance.

Martin Storsjö

12:38 p.m.

On Tue, 23 Apr 2013, Niels Möller wrote:

...

Martin Storsjö martin@martin.st writes:

...
With the patch I just sent, nettle 2.7 builds in the "tricky" cross build envs I normally build nettle in.

Thanks!

BTW, is there any interest for some kind of release party in Stockholm? Possibly on Friday.

I'm not in the area, and wouldn't be able to make it this week anyway.

I built and tested the rc on native windows as well - in 32 bit mode, all tests succeed, while in 64 bit mode, quite a few of them seem to fail - the failing tests are:

salsa20 sha224 sha256 sha384 sha512 hmac umac yarrow pbkdf2 rsa dsa dsa-keygen

Haven't checked yet whether the same failures can be reproduced with wine64.

// Martin

Martin Storsjö

1:10 p.m.

On Tue, 23 Apr 2013, Martin Storsjö wrote:

...

I built and tested the rc on native windows as well - in 32 bit mode, all tests succeed, while in 64 bit mode, quite a few of them seem to fail - the failing tests are:

salsa20 sha224 sha256 sha384 sha512 hmac umac yarrow pbkdf2 rsa dsa dsa-keygen

Haven't checked yet whether the same failures can be reproduced with wine64.

The exact same tests fail with wine64 when built and run from linux as well (with gmp 5.1.1 if that's of any relevance).

// Martin

Martin Storsjö

2:55 p.m.

On Tue, 23 Apr 2013, Martin Storsjö wrote:

...

On Tue, 23 Apr 2013, Martin Storsjö wrote:

...
I built and tested the rc on native windows as well - in 32 bit mode, all tests succeed, while in 64 bit mode, quite a few of them seem to fail - the failing tests are:

salsa20 sha224 sha256 sha384 sha512 hmac umac yarrow pbkdf2 rsa dsa dsa-keygen

Haven't checked yet whether the same failures can be reproduced with wine64.

The exact same tests fail with wine64 when built and run from linux as well (with gmp 5.1.1 if that's of any relevance).

With the final two patches I've just sent, all of these pass, both on wine64 and on real windows.

// Martin

nisse＠lysator.liu.se

5:06 p.m.

Martin Storsjö martin@martin.st writes:

...

With the final two patches I've just sent, all of these pass, both on wine64 and on real windows.

Many thanks for tracking these w64-related bugs down. Do you have a debugging environment for w64, or did you find the bugs be reading the (preprocessed) assembly code?

Looking at W64_ENTRY, do you remember why the stack allocation for saved xmm registers is done as

sub [$]eval(8 + 16*($2 - 6)), %rsp

Subtracting the extra 8 bytes seems useless; at least I don't see anything stored at that location. I wonder if maybe that was an incomplete attempt at getting a 16-byte aligned pointer for the the xmm stores?

Regards, /Niels

-- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance.

Martin Storsjö

7:17 p.m.

On Tue, 23 Apr 2013, Niels Möller wrote:

...

Martin Storsjö martin@martin.st writes:

...
With the final two patches I've just sent, all of these pass, both on wine64 and on real windows.

Many thanks for tracking these w64-related bugs down. Do you have a debugging environment for w64, or did you find the bugs be reading the (preprocessed) assembly code?

I have access to a real windows environment as well, where I used gdb to help me track them down, which really helped a lot this time. I've never taken the time to figure out how to use winedbg properly...

...

Looking at W64_ENTRY, do you remember why the stack allocation for saved xmm registers is done as
 sub	[$]eval(8 + 16*($2 - 6)), %rsp
Subtracting the extra 8 bytes seems useless; at least I don't see anything stored at that location. I wonder if maybe that was an incomplete attempt at getting a 16-byte aligned pointer for the the xmm stores?

Hmm, yes, I think that might have been the case. So since we can't rely on that being aligned anyway, we could just as well skip the 8 byte offset.

// Martin

nisse＠lysator.liu.se

7:34 p.m.

Martin Storsjö martin@martin.st writes:

...

Hmm, yes, I think that might have been the case. So since we can't rely on that being aligned anyway, we could just as well skip the 8 byte offset.

If it works now, I don't think we should touch this code further before release.

For later optimization (if it really makes a difference to performance if we use aligned or unaligned loads and stores here? I don't know), one could keep the 8 byte extra allocation, then do something like

lea 8(%rsp), %r10 and $-16, %r10

(%r10 should always be free for scratch use at both entry and exit, right?). Then %r10 will be 16 byte aligned, and hold either %rsp or %rsp + 8. And we can then do fully aligned loads and stores of the xmm registers via offsets from %r10.

Regards, /Niels

-- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance.

Martin Storsjö

24 Apr 24 Apr

12:01 p.m.

On Tue, 23 Apr 2013, Niels Möller wrote:

...

Martin Storsjö martin@martin.st writes:

...
Hmm, yes, I think that might have been the case. So since we can't rely on that being aligned anyway, we could just as well skip the 8 byte offset.

If it works now, I don't think we should touch this code further before release.

Yes, that's probably wisest.

...

For later optimization (if it really makes a difference to performance if we use aligned or unaligned loads and stores here? I don't know), one could keep the 8 byte extra allocation, then do something like

lea 8(%rsp), %r10 and $-16, %r10

(%r10 should always be free for scratch use at both entry and exit, right?). Then %r10 will be 16 byte aligned, and hold either %rsp or %rsp

And we can then do fully aligned loads and stores of the xmm

registers via offsets from %r10.

That would probably work. I don't know these things well enough to say whether there's any serious performance to be gained by doing this, compared to the inconvenience of wasting one register.

// Martin

4483

Age (days ago)

4486

Last active (days ago)

nettle-bugs@lists.lysator.liu.se

9 comments

2 participants

tags (0)

participants (2)

Martin Storsjö
nisse＠lysator.liu.se