I PUSHED THE BUTTON.
Now the _Crypto module isn't used (though still built). That means that you'll need Nettle installed to use any parts of Pike that uses Crypto.
But you will shortly fix this by checking in Nettle?
/ Peter Bortas
Previous text:
2004-02-03 00:44: Subject: Nettle
I PUSHED THE BUTTON.
Now the _Crypto module isn't used (though still built). That means that you'll need Nettle installed to use any parts of Pike that uses Crypto.
/ Martin Nilsson (saturator)
No, I will fix residual problems elsewhere in the lib tree. I'll leave difficult things like configure and stuff to whoever knows how libraries should be bundled. There was talk about getting GMP in as well, right?
/ Martin Nilsson (saturator)
Previous text:
2004-02-03 00:45: Subject: Nettle
But you will shortly fix this by checking in Nettle?
/ Peter Bortas
Well, not in CVS I suppose, but yes.
/ Peter Bortas
Previous text:
2004-02-03 00:50: Subject: Nettle
No, I will fix residual problems elsewhere in the lib tree. I'll leave difficult things like configure and stuff to whoever knows how libraries should be bundled. There was talk about getting GMP in as well, right?
/ Martin Nilsson (saturator)
Then this leads to a question: How do I get Nettle? There wasn't any gentoo package for it, like gmp.
/ Mirar
Previous text:
2004-02-03 00:50: Subject: Nettle
No, I will fix residual problems elsewhere in the lib tree. I'll leave difficult things like configure and stuff to whoever knows how libraries should be bundled. There was talk about getting GMP in as well, right?
/ Martin Nilsson (saturator)
http://www.lysator.liu.se/~nisse/nettle/
/ Martin Nilsson (saturator)
Previous text:
2004-02-03 08:28: Subject: Nettle
Then this leads to a question: How do I get Nettle? There wasn't any gentoo package for it, like gmp.
/ Mirar
ftp://ftp.lysator.liu.se/pub/security/lsh/nettle-1.8.tar.gz
There's a fairly minimal home page at http://www.lysator.liu.se/~nisse/nettle.
/ Niels Möller (vässar rödpennan)
Previous text:
2004-02-03 08:28: Subject: Nettle
Then this leads to a question: How do I get Nettle? There wasn't any gentoo package for it, like gmp.
/ Mirar
libnettle1 in Debian.
/ Peter Bortas
Previous text:
2004-02-03 10:50: Subject: Nettle
ftp://ftp.lysator.liu.se/pub/security/lsh/nettle-1.8.tar.gz
There's a fairly minimal home page at http://www.lysator.liu.se/~nisse/nettle.
/ Niels Möller (vässar rödpennan)
I built it for OS X and installed it so Pikefarm can find it. Anyway, while looking at the code I optimized the RC4 function for better performance :-)
arcfour_crypt(struct arcfour_ctx *ctx, unsigned length, uint8_t *dst, const uint8_t *src) { register uint8_t i, j; register int si, sj;
i = ctx->i; j = ctx->j; while(length--) { i++; i &= 0xff; si = ctx->S[i]; j += si; j &= 0xff; sj = ctx->S[i] = ctx->S[j]; ctx->S[j] = si; *dst++ = *src++ ^ ctx->S[ (si + sj) & 0xff ]; } ctx->i = i; ctx->j = j; }
This improved performance from ~25 MB/s to ~39 MB/s on my G4/500 (gcc 3.3), and from ~14 MB/s to ~17 MB/s on a x86/600 (gcc 2.95). Feel free to verify and incorporate it. (arcfour_stream() should be modified in the same way of course.)
/ Jonas Walldén
Previous text:
2004-02-03 10:50: Subject: Nettle
ftp://ftp.lysator.liu.se/pub/security/lsh/nettle-1.8.tar.gz
There's a fairly minimal home page at http://www.lysator.liu.se/~nisse/nettle.
/ Niels Möller (vässar rödpennan)
Nice, I'll try that. Have you looked at the assembler output? It would be interesting to know if there's any room for improvement by handtuning the assembler code.
/ Niels Möller (vässar rödpennan)
Previous text:
2004-02-05 10:02: Subject: Nettle
I built it for OS X and installed it so Pikefarm can find it. Anyway, while looking at the code I optimized the RC4 function for better performance :-)
arcfour_crypt(struct arcfour_ctx *ctx, unsigned length, uint8_t *dst, const uint8_t *src) { register uint8_t i, j; register int si, sj;
i = ctx->i; j = ctx->j; while(length--) { i++; i &= 0xff; si = ctx->S[i]; j += si; j &= 0xff; sj = ctx->S[i] = ctx->S[j]; ctx->S[j] = si; *dst++ = *src++ ^ ctx->S[ (si + sj) & 0xff ]; } ctx->i = i; ctx->j = j; }
This improved performance from ~25 MB/s to ~39 MB/s on my G4/500 (gcc 3.3), and from ~14 MB/s to ~17 MB/s on a x86/600 (gcc 2.95). Feel free to verify and incorporate it. (arcfour_stream() should be modified in the same way of course.)
/ Jonas Walldén
I used Apple's Shark tool which displays load stalls, cycle counts, loop alignment and more. It warned for loading and storing from the same memory address within a single PPC970 (G5) "bundle", and that caused me to reconsider the memory I/O in the loop.
This was with standard compiler flags so I didn't get any loop unrolling etc so there's probably plenty of headroom for further tuning. Other things which may be beneficial is to special-case for aligned memory buffers and read 32-bit chunks to/from memory at a time instead of XOR:ing individual bytes. That also applies to methods like memxor().
Ideally, since there are trade-offs for different CPUs even within the same family (in case of PPC: 604, G3, G4 (AltiVec), G5 (64-bit) etc) the library should pick the best implementation at run-time and not when compiled. For Pike it's not feasible to require a G4 or G5 but we still want to use vectorized code when possible.
/ Jonas Walldén
Previous text:
2004-02-05 13:05: Subject: Nettle
Nice, I'll try that. Have you looked at the assembler output? It would be interesting to know if there's any room for improvement by handtuning the assembler code.
/ Niels Möller (vässar rödpennan)
On my laptop (intel P4), I get an increase from 45 MB/s to 66MB/s.
Does it matter if the si, sj are ints or uint8_t? I get no speed difference.
The inner loop gets compiled into (intel, gcc-3.3, -O2)
.L28: incb -13(%ebp) decl %ebx movzbl -13(%ebp), %edx movzbl (%edx,%edi), %ecx addb %cl, -14(%ebp) movzbl -14(%ebp), %eax movzbl (%eax,%edi), %eax movb %al, (%edx,%edi) addb %cl, %al movl 16(%ebp), %edx movzbl %al, %eax movzbl (%eax,%edi), %eax xorb (%esi), %al incl %esi movb %al, (%edx) incl %edx cmpl $-1, %ebx movl %edx, 16(%ebp) jne .L28
It seems it can't fit all variables into registers, hence the save and restore operations via %ebp.
I wonder if my intel books will ever arrive.
/ Niels Möller (vässar rödpennan)
Previous text:
2004-02-05 10:02: Subject: Nettle
I built it for OS X and installed it so Pikefarm can find it. Anyway, while looking at the code I optimized the RC4 function for better performance :-)
arcfour_crypt(struct arcfour_ctx *ctx, unsigned length, uint8_t *dst, const uint8_t *src) { register uint8_t i, j; register int si, sj;
i = ctx->i; j = ctx->j; while(length--) { i++; i &= 0xff; si = ctx->S[i]; j += si; j &= 0xff; sj = ctx->S[i] = ctx->S[j]; ctx->S[j] = si; *dst++ = *src++ ^ ctx->S[ (si + sj) & 0xff ]; } ctx->i = i; ctx->j = j; }
This improved performance from ~25 MB/s to ~39 MB/s on my G4/500 (gcc 3.3), and from ~14 MB/s to ~17 MB/s on a x86/600 (gcc 2.95). Feel free to verify and incorporate it. (arcfour_stream() should be modified in the same way of course.)
/ Jonas Walldén
It would be possible to use MMX. If nothing else you then have more registers. However, it can only access memory on even 8-byte boundaries.
/ Per Hedbor ()
Previous text:
2004-02-05 14:21: Subject: Nettle
On my laptop (intel P4), I get an increase from 45 MB/s to 66MB/s.
Does it matter if the si, sj are ints or uint8_t? I get no speed difference.
The inner loop gets compiled into (intel, gcc-3.3, -O2)
.L28: incb -13(%ebp) decl %ebx movzbl -13(%ebp), %edx movzbl (%edx,%edi), %ecx addb %cl, -14(%ebp) movzbl -14(%ebp), %eax movzbl (%eax,%edi), %eax movb %al, (%edx,%edi) addb %cl, %al movl 16(%ebp), %edx movzbl %al, %eax movzbl (%eax,%edi), %eax xorb (%esi), %al incl %esi movb %al, (%edx) incl %edx cmpl $-1, %ebx movl %edx, 16(%ebp) jne .L28
It seems it can't fit all variables into registers, hence the save and restore operations via %ebp.
I wonder if my intel books will ever arrive.
/ Niels Möller (vässar rödpennan)
If I do any assembler at all, I think I'd prefer a generic x86, at least for a start.
What could you win by using mmx instructions? One could generate some k (k = 16 or so) bytes of the keystream, put it into a register, and then apply xor them all to the source stream at onece, assuming source and destination are properly aligned. But most instructions are used for generating the key stream, and it seems non-trivial to get any parallelization there.
(My understanding of mmx, sse, vis etc is quite vague).
/ Niels Möller (vässar rödpennan)
Previous text:
2004-02-05 14:32: Subject: Nettle
It would be possible to use MMX. If nothing else you then have more registers. However, it can only access memory on even 8-byte boundaries.
/ Per Hedbor ()
The major advantage in this case would be the extra registers, I think.
/ Per Hedbor ()
Previous text:
2004-02-05 14:39: Subject: Nettle
If I do any assembler at all, I think I'd prefer a generic x86, at least for a start.
What could you win by using mmx instructions? One could generate some k (k = 16 or so) bytes of the keystream, put it into a register, and then apply xor them all to the source stream at onece, assuming source and destination are properly aligned. But most instructions are used for generating the key stream, and it seems non-trivial to get any parallelization there.
(My understanding of mmx, sse, vis etc is quite vague).
/ Niels Möller (vässar rödpennan)
Nice to see you get a substantial speed-up too.
/ Jonas Walldén
Previous text:
2004-02-05 14:21: Subject: Nettle
On my laptop (intel P4), I get an increase from 45 MB/s to 66MB/s.
Does it matter if the si, sj are ints or uint8_t? I get no speed difference.
The inner loop gets compiled into (intel, gcc-3.3, -O2)
.L28: incb -13(%ebp) decl %ebx movzbl -13(%ebp), %edx movzbl (%edx,%edi), %ecx addb %cl, -14(%ebp) movzbl -14(%ebp), %eax movzbl (%eax,%edi), %eax movb %al, (%edx,%edi) addb %cl, %al movl 16(%ebp), %edx movzbl %al, %eax movzbl (%eax,%edi), %eax xorb (%esi), %al incl %esi movb %al, (%edx) incl %edx cmpl $-1, %ebx movl %edx, 16(%ebp) jne .L28
It seems it can't fit all variables into registers, hence the save and restore operations via %ebp.
I wonder if my intel books will ever arrive.
/ Niels Möller (vässar rödpennan)
Any nice test program so I don't have to write my own?
/ Mirar
Previous text:
2004-02-05 14:21: Subject: Nettle
On my laptop (intel P4), I get an increase from 45 MB/s to 66MB/s.
Does it matter if the si, sj are ints or uint8_t? I get no speed difference.
The inner loop gets compiled into (intel, gcc-3.3, -O2)
.L28: incb -13(%ebp) decl %ebx movzbl -13(%ebp), %edx movzbl (%edx,%edi), %ecx addb %cl, -14(%ebp) movzbl -14(%ebp), %eax movzbl (%eax,%edi), %eax movb %al, (%edx,%edi) addb %cl, %al movl 16(%ebp), %edx movzbl %al, %eax movzbl (%eax,%edi), %eax xorb (%esi), %al incl %esi movb %al, (%edx) incl %edx cmpl $-1, %ebx movl %edx, 16(%ebp) jne .L28
It seems it can't fit all variables into registers, hence the save and restore operations via %ebp.
I wonder if my intel books will ever arrive.
/ Niels Möller (vässar rödpennan)
Now I'm confused. I do have the 45 and 66 MB/s figures in my *shell* buffer, but I can't reproduce the 66 MB/s figure. Perhaps that was with the buggy version of the code? Anyway, x86 performance for the C version doesn't matter that much anymore.
/ Niels Möller (vässar rödpennan)
Previous text:
2004-02-05 14:21: Subject: Nettle
On my laptop (intel P4), I get an increase from 45 MB/s to 66MB/s.
Does it matter if the si, sj are ints or uint8_t? I get no speed difference.
The inner loop gets compiled into (intel, gcc-3.3, -O2)
.L28: incb -13(%ebp) decl %ebx movzbl -13(%ebp), %edx movzbl (%edx,%edi), %ecx addb %cl, -14(%ebp) movzbl -14(%ebp), %eax movzbl (%eax,%edi), %eax movb %al, (%edx,%edi) addb %cl, %al movl 16(%ebp), %edx movzbl %al, %eax movzbl (%eax,%edi), %eax xorb (%esi), %al incl %esi movb %al, (%edx) incl %edx cmpl $-1, %ebx movl %edx, 16(%ebp) jne .L28
It seems it can't fit all variables into registers, hence the save and restore operations via %ebp.
I wonder if my intel books will ever arrive.
/ Niels Möller (vässar rödpennan)
On Tue, Feb 03, 2004 at 12:45:05AM +0100, Martin Nilsson (saturator) @ Pike (-) developers forum wrote:
Now the _Crypto module isn't used (though still built). That means that you'll need Nettle installed to use any parts of Pike that uses Crypto.
Seems that I missed all this fun... So I've a question - why it is required?
I mean, instead of something more standard, which is present almost everywhere (like openssl)?
Regards, /Al
Because we would like something small, fast and bugfree. I honestly believe that "What is the best product" is more interesting than "What is the most common product". You don't chose Pike because it is common.
/ Martin Nilsson (saturator)
Previous text:
2004-02-26 15:17: Subject: Re: Nettle
On Tue, Feb 03, 2004 at 12:45:05AM +0100, Martin Nilsson (saturator) @ Pike (-) developers forum wrote:
Now the _Crypto module isn't used (though still built). That means that you'll need Nettle installed to use any parts of Pike that uses Crypto.
Seems that I missed all this fun... So I've a question - why it is required?
I mean, instead of something more standard, which is present almost everywhere (like openssl)?
Regards, /Al
/ Brevbäraren
On Thu, Feb 26, 2004 at 03:45:13PM +0100, Martin Nilsson (saturator) @ Pike (-) developers forum wrote:
Because we would like something small, fast and bugfree.
Well, I am not objecting the existence of Netlle, but I wonder why it is not included in Pike source distribution (if it is small) :)
is the most common product". You don't chose Pike because it is common.
I chose it because it is unique. But if I need a lot of prerequisites to build it (to use _basic_ functionality - and crypto stuff _is_ basic) - it makes me uncomfortable, and I've to manually bundle everything that is needed for distribution and use.
It was discussed several times, but I still don't understand why packages like Nettle (which provide fundamentals) are "external" (i.e. must be installed separately), while a lot of unneeded (and uncommon) stuff (like DVB) is included in distribution...
BTW, Ssleay (old) stuff is included, but I doubt that anyone uses it... or ever used :)
Regards, /Al
Well, I am not objecting the existence of Netlle, but I wonder why it is not included in Pike source distribution (if it is small) :)
It is included in all Pike 7.5 source distributions.
/ Martin Nilsson (saturator)
Previous text:
2004-02-26 16:06: Subject: Re: Nettle
On Thu, Feb 26, 2004 at 03:45:13PM +0100, Martin Nilsson (saturator) @ Pike (-) developers forum wrote:
Because we would like something small, fast and bugfree.
Well, I am not objecting the existence of Netlle, but I wonder why it is not included in Pike source distribution (if it is small) :)
is the most common product". You don't chose Pike because it is common.
I chose it because it is unique. But if I need a lot of prerequisites to build it (to use _basic_ functionality - and crypto stuff _is_ basic) - it makes me uncomfortable, and I've to manually bundle everything that is needed for distribution and use.
It was discussed several times, but I still don't understand why packages like Nettle (which provide fundamentals) are "external" (i.e. must be installed separately), while a lot of unneeded (and uncommon) stuff (like DVB) is included in distribution...
BTW, Ssleay (old) stuff is included, but I doubt that anyone uses it... or ever used :)
Regards, /Al
/ Brevbäraren
Well, I am not objecting the existence of Netlle, but I wonder why it is not included in Pike source distribution (if it is small) :)
It will be included in the stable distribution.
/ Peter Bortas (Kein paket!)
Previous text:
2004-02-26 16:06: Subject: Re: Nettle
On Thu, Feb 26, 2004 at 03:45:13PM +0100, Martin Nilsson (saturator) @ Pike (-) developers forum wrote:
Because we would like something small, fast and bugfree.
Well, I am not objecting the existence of Netlle, but I wonder why it is not included in Pike source distribution (if it is small) :)
is the most common product". You don't chose Pike because it is common.
I chose it because it is unique. But if I need a lot of prerequisites to build it (to use _basic_ functionality - and crypto stuff _is_ basic) - it makes me uncomfortable, and I've to manually bundle everything that is needed for distribution and use.
It was discussed several times, but I still don't understand why packages like Nettle (which provide fundamentals) are "external" (i.e. must be installed separately), while a lot of unneeded (and uncommon) stuff (like DVB) is included in distribution...
BTW, Ssleay (old) stuff is included, but I doubt that anyone uses it... or ever used :)
Regards, /Al
/ Brevbäraren
Well, I am not objecting the existence of Netlle, but I wonder why it is not included in Pike source distribution (if it is small) :)
It is, or will be, or whatever. Just as gmp. Note here that a cvs checkout is not considered a distribution.
It was discussed several times, but I still don't understand why packages like Nettle (which provide fundamentals) are "external" (i.e. must be installed separately), while a lot of unneeded (and uncommon) stuff (like DVB) is included in distribution...
DVB? It's not included in my Pike... But then again, I check out from CVS.
None of these xenofarm builders have DVB support: http://www.mirar.org/configinfo.html
/ Mirar
Previous text:
2004-02-26 16:06: Subject: Re: Nettle
On Thu, Feb 26, 2004 at 03:45:13PM +0100, Martin Nilsson (saturator) @ Pike (-) developers forum wrote:
Because we would like something small, fast and bugfree.
Well, I am not objecting the existence of Netlle, but I wonder why it is not included in Pike source distribution (if it is small) :)
is the most common product". You don't chose Pike because it is common.
I chose it because it is unique. But if I need a lot of prerequisites to build it (to use _basic_ functionality - and crypto stuff _is_ basic) - it makes me uncomfortable, and I've to manually bundle everything that is needed for distribution and use.
It was discussed several times, but I still don't understand why packages like Nettle (which provide fundamentals) are "external" (i.e. must be installed separately), while a lot of unneeded (and uncommon) stuff (like DVB) is included in distribution...
BTW, Ssleay (old) stuff is included, but I doubt that anyone uses it... or ever used :)
Regards, /Al
/ Brevbäraren
On Thu, Feb 26, 2004 at 04:30:04PM +0100, Mirar @ Pike developers forum wrote:
DVB? It's not included in my Pike... But then again, I check out from CVS.
Hmm? In Pike 7.5 CVS (from 2004-02-25) it is included. Or what do you mean by "your Pike"? :)
I can mention also SDL stuff (no doubt someone needs this, but I guess it is quite uncommon). Or Protocols/LysKOM... Or... Well... There are a lot of ancient or rarely used stuff in modules/ :)
None of these xenofarm builders have DVB support:
So why it is included in distribution, then? :)
Regards, /Al
SDL is commonly used.
/ Per Hedbor ()
Previous text:
2004-02-26 16:52: Subject: Re: Nettle
On Thu, Feb 26, 2004 at 04:30:04PM +0100, Mirar @ Pike developers forum wrote:
DVB? It's not included in my Pike... But then again, I check out from CVS.
Hmm? In Pike 7.5 CVS (from 2004-02-25) it is included. Or what do you mean by "your Pike"? :)
I can mention also SDL stuff (no doubt someone needs this, but I guess it is quite uncommon). Or Protocols/LysKOM... Or... Well... There are a lot of ancient or rarely used stuff in modules/ :)
None of these xenofarm builders have DVB support:
So why it is included in distribution, then? :)
Regards, /Al
/ Brevbäraren
None of these xenofarm builders have DVB support:
So why it is included in distribution, then? :)
It isn't? Only the *glue* to DVB (libdvb?) is. Precisely like the glue to nettle, and the glue to gmp, and the glue to libpanda, and the glue to jpeglib, and the glue to GTK, and the glue to...
If you check config.info, most of the modules reported there needs an external library.
SDL is commonly used.
By whom? And where?
Mostly anyone who plays with GL or multimedia. For instance everyone who runs AIDO.
/ Mirar
Previous text:
2004-02-26 17:13: Subject: Re: Nettle
On Thu, Feb 26, 2004 at 04:55:02PM +0100, Per Hedbor () @ Pike (-) developers forum wrote:
SDL is commonly used.
By whom? And where?
Regards, /Al
/ Brevbäraren
Martin Nilsson (saturator) @ Pike (-) developers forum wrote:
Because we would like something small, fast and bugfree. I honestly believe that "What is the best product" is more interesting than "What is the most common product". You don't chose Pike because it is common.
There is also a licence problem I think. OpenSSL is an Apache style licence and Pike is LGPL/Mozilla/GPL.
About OpenSSL in Pike, there is a Pexts for it so that you can use it in Pike. Now I don't agree that Pike SSL must be qualified as "the best product" or even better than OpenSSL, it's not usuable for a production environnement (at least in Pike 7.2): breaks with IE, slow and doesn't support lot of things. Honestly I would prefer to have an openssl based server than nothing.
Besides I don't think it's good to spend too much time reinventing the wheel and maintaining it (for the SSL/TLS protocol part). There are good libraries out there (Mozilla NSS or Gnutls for example) which could propably be used and that will make Pike SSL faster with more features and would require less maintenance in the long run IMHO. For the user Mozilla NSS will always be a better product that Nettle/Pike SSL unless you spend 6 months in Pike SSL (adding TLS, hardware accel, docs, extensive tests, smart session cache,...).
Just my 0.02€.
/ David Gourdelier
On Thu, Feb 26, 2004 at 04:23:44PM +0100, David Gourdelier wrote:
There is also a licence problem I think. OpenSSL is an Apache style licence and Pike is LGPL/Mozilla/GPL.
There is no problem - while it might be difficult to bundle OpenSSL with Pike, OTOH, OpenSSL is installed virtually everywhere.
About OpenSSL in Pike, there is a Pexts for it so that you can use it in Pike.
I tried it once. Since then I never ever will touch Pexts again... The idea was good, but the implementation... I guess nothing changed since then...
Regards, /Al
SSL is another issue. We are (were at least) discussing crypto libraries. And 7.2 was released over three years ago, so it's not really fair to use in any comparision (though I don't say that SSL has no problems).
/ Martin Nilsson (saturator)
Previous text:
2004-02-26 16:25: Subject: Re: Nettle
Martin Nilsson (saturator) @ Pike (-) developers forum wrote:
Because we would like something small, fast and bugfree. I honestly believe that "What is the best product" is more interesting than "What is the most common product". You don't chose Pike because it is common.
There is also a licence problem I think. OpenSSL is an Apache style licence and Pike is LGPL/Mozilla/GPL.
About OpenSSL in Pike, there is a Pexts for it so that you can use it in Pike. Now I don't agree that Pike SSL must be qualified as "the best product" or even better than OpenSSL, it's not usuable for a production environnement (at least in Pike 7.2): breaks with IE, slow and doesn't support lot of things. Honestly I would prefer to have an openssl based server than nothing.
Besides I don't think it's good to spend too much time reinventing the wheel and maintaining it (for the SSL/TLS protocol part). There are good libraries out there (Mozilla NSS or Gnutls for example) which could propably be used and that will make Pike SSL faster with more features and would require less maintenance in the long run IMHO. For the user Mozilla NSS will always be a better product that Nettle/Pike SSL unless you spend 6 months in Pike SSL (adding TLS, hardware accel, docs, extensive tests, smart session cache,...).
Just my 0.02.
/ David Gourdelier
/ Brevbäraren
pike-devel@lists.lysator.liu.se