reasons why pike is better than python?

List overview All Threads
Download

newer

older

for-loop optimization

can anybody tell something more...

Martin Baehr

28 Jan 2003 28 Jan '03

7:05 p.m.

hi,

someone posted this url: http://trific.ath.cx/resources/python/optimization/ in roxenchat with the comment that pike should have similar comments to help people optimize their code.

to me this page much more reads like a list of reasons not to use pike.

could someone please assert that pike doesn't have issues like these?

of course some optimization suggestions pointing out what things are in fact slow would be nice too.

greetings, martin.

Show replies by date

Martin Nilsson (�skblod) ＠ Pike (-) developers forum

28 Jan 28 Jan

7:25 p.m.

There are of course dos and donts in Pike as well, but we try to adress them in the optimizer when we find them. But some tips are general and applicable

- Improve the algoritm One can not emphesize the importance of good algorithms enough.

- Trust nobody What was a really clever trick in one version in Pike may turn out to be slower than the naive way of doing it in a later Pike release (because the naive way has been optimized...)

- Avoid exceptions Exceptions are to be used in exceptional circumstances. In a way it is good that there isn't a good exception system in Pike, so that people writes code that more easily can be analyzed by humans and optimizers.

- Cache results Always possible, if you have more memory than time.

/ Martin Nilsson (Åskblod)

Previous text:

...

2003-01-28 20:09: Subject: reasons why pike is better than python?

hi,

someone posted this url: http://trific.ath.cx/resources/python/optimization/ in roxenchat with the comment that pike should have similar comments to help people optimize their code.

to me this page much more reads like a list of reasons not to use pike.

could someone please assert that pike doesn't have issues like these?

of course some optimization suggestions pointing out what things are in fact slow would be nice too.

greetings, martin.

/ Brevbäraren

Martin Baehr

11:39 p.m.

On Tue, Jan 28, 2003 at 08:25:03PM +0100, Martin Nilsson (Åskblod) @ Pike (-) developers forum wrote:

...

There are of course dos and donts in Pike as well, but we try to adress them in the optimizer when we find them.

details? what about appending elements to an array vs a mapping? the latter is supposed to be faster (some people claim anyways)

regexps are very slow, better use sscanf()

any other?

greetings, martin.

Martin Nilsson (�skblod) ＠ Pike (-) developers forum

11:55 p.m.

No, appending elements to arrays are fast in 7.4. No, Pike regexps are faster than e.g. python regexps according to the langauge shootout.

Generally speaking, try to push iterations from Pike code down to the lower levels. Eg. in C code you often have for loops and other iterators while in Pike you should try to use the map, filter, sum etc. functions. But iterations has become much faster in 7.4 than before.

If you are concatenating large amounts of strings it may be faster to use String.Buffer to avoid the shared string performance penanlty. The overhead of using an object however makes it slower for less than 11 concatenations according to my measurements. I am working on a plan to have this kind of optimizations done automatically.

/ Martin Nilsson (Åskblod)

Previous text:

...

2003-01-29 00:42: Subject: Re: reasons why pike is better than python?

On Tue, Jan 28, 2003 at 08:25:03PM +0100, Martin Nilsson (Åskblod) @ Pike (-) developers forum wrote:

...
There are of course dos and donts in Pike as well, but we try to adress them in the optimizer when we find them.

details? what about appending elements to an array vs a mapping? the latter is supposed to be faster (some people claim anyways)

regexps are very slow, better use sscanf()

any other?

greetings, martin.

/ Brevbäraren

Martin Baehr

11:59 p.m.

On Wed, Jan 29, 2003 at 12:55:03AM +0100, Martin Nilsson (Åskblod) @ Pike (-) developers forum wrote:

...

No, Pike regexps are faster than e.g. python regexps according to the langauge shootout.

then pythons regexps are slow too, but i wasn't really comparing to other languages, i am comparing to alternatives within pike.

given two ways to solve a problem in pike, which way is faster? that is what i am after...

greetings, martin.

Alexander Demenshin

29 Jan 29 Jan

12:30 a.m.

On Wed, Jan 29, 2003 at 12:55:03AM +0100, Martin Nilsson (�skblod) @ Pike (-) developers forum wrote:

...

No, appending elements to arrays are fast in 7.4.

It depends. If it is combined with elemnts removal, epsecially like:

a = a[1..]; a += ({ some_value }); This is quite slow, especially for big arrays :)

...

No, Pike regexps are faster than e.g. python regexps according to the langauge shootout.

As Martin said already - then Python regexps are slow too. And everyone knows alternative which is faster and powerful, but it is still not in Pike :)

PS: 2 Martin: could you please change the subject to something like: "Reasons why apples are better than oranges?" :)) I think that every language has its advantages and disavantages, which are dependent on task, so... :))

Regards, /Al

Martin Nilsson (�skblod) ＠ Pike (-) developers forum

1:10 a.m.

...

a = a[1..]; a += ({ some_value });

This is quite slow, especially for big arrays :)

And mappings would help here, or what is your point? That inappropriate use of a datatype is slower than using the right one (ADT.Queue)?

...

As Martin said already - then Python regexps are slow too. And everyone knows alternative which is faster and powerful, but it is still not in Pike :)

But that is wrong. Running regexp tests written purely in C using the PCRE library is 13% slower than the same tests run in Perl. Pike is then 3% slower than C and Python is then 62% slower than Pike. Then follows Java, which is 22% slower than Python. To call Pike regexps slow is simply not true. And to think that the performance is going to improve without doing something drastic, like writing a new regexp engine from scratch, is to be unrealistic IMHO.

/ Martin Nilsson (Åskblod)

Previous text:

...

2003-01-29 01:31: Subject: Re: reasons why pike is better than python?

On Wed, Jan 29, 2003 at 12:55:03AM +0100, Martin Nilsson (Åskblod) @ Pike (-) developers forum wrote:

...
No, appending elements to arrays are fast in 7.4.

It depends. If it is combined with elemnts removal, epsecially like:

a = a[1..]; a += ({ some_value });

This is quite slow, especially for big arrays :)

...
No, Pike regexps are faster than e.g. python regexps according to the langauge shootout.

As Martin said already - then Python regexps are slow too. And everyone knows alternative which is faster and powerful, but it is still not in Pike :)

PS: 2 Martin: could you please change the subject to something like: "Reasons why apples are better than oranges?" :)) I think that every language has its advantages and disavantages, which are dependent on task, so... :))

Regards, /Al

/ Brevbäraren

Alexander Demenshin

3:37 a.m.

New subject: PCRE again (Was: reasons why pike is better than python?)

On Wed, Jan 29, 2003 at 02:10:03AM +0100, Martin Nilsson (�skblod) @ Pike (-) developers forum wrote:

...

...
a = a[1..]; a += ({ some_value });

This is quite slow, especially for big arrays :)

And mappings would help here, or what is your point? That inappropriate use of a datatype is slower than using the right one (ADT.Queue)?

ADT.Queue is using arrays to implemet queues anyway, where is the advantage?

...

But that is wrong. Running regexp tests written purely in C using the PCRE library is 13% slower than the same tests run in Perl.

??? Just tested:

aldem@fort:/old/home/aldem/src/pcre-3.9$ time ./perltest testdata/testinput1 > /dev/null

real 0m2.194s user 0m2.190s sys 0m0.010s aldem@fort:/old/home/aldem/src/pcre-3.9$ time ./pcretest testdata/testinput1 > /dev/null

real 0m0.065s user 0m0.040s sys 0m0.020s

So where is the Perl's advantage? I did similar benchmars long time ago too, results were very similar (this test was with PCRE 3.9 and Perl 5.6.1).

...

Java, which is 22% slower than Python. To call Pike regexps slow is simply not true.

For some particular tasks - may be, but they are less powerful. And, again, PCRE gives us Perl compatibily, which may simplify porting from Perl (and anyway PCRE is in use in PHP/Apache etc - a lot of places).

I may add similar tests functionality using my PCRE module for Pike and publish the results, if someone is interested...

...

And to think that the performance is going to improve without doing something drastic, like writing a new regexp engine from scratch, is to be unrealistic IMHO.

Please, please, do some benchmarks first :)

Regards, /Al

Dan Nelson

3:53 a.m.

New subject: PCRE again (Was: reasons why pike is better than python?)

In the last episode (Jan 29), Alexander Demenshin said:

...

On Wed, Jan 29, 2003 at 02:10:03AM +0100, Martin Nilsson (┼skblod) @ Pike (-) developers forum wrote:

...
...
a = a[1..]; a += ({ some_value });

This is quite slow, especially for big arrays :)

And mappings would help here, or what is your point? That inappropriate use of a datatype is slower than using the right one (ADT.Queue)?

ADT.Queue is using arrays to implemet queues anyway, where is the advantage?

ADT.Queue doesn't adjust the internal array on pop, and only adjusts on push if something has been popped (and even then only after 100 pushes). If your queue is fixed size, you should implement it as an array with head and tail pointers that wrap when they go off the end.

-- Dan Nelson dnelson@allantgroup.com

Martin Baehr

10:31 a.m.

New subject: PCRE again (Was: reasons why pike is better than python?)

and how would you implement that in pike?

greeting, martin.

Per Hedbor () ＠ Pike (-) developers forum

11:10 a.m.

New subject: PCRE again (Was: reasons why pike is better than python?)

It already is implemented, go ahead and look.

Basically, in pseudocode, without error checks:

int head, tail; array circular_queue;

mixed pop() { wrap( tail+1 ); return circular_queue[tail]; }

void push(mixed v) { wrap( head+1 ); circular_queue[head]=v; }

/ Per Hedbor ()

Previous text:

...

2003-01-29 11:35: Subject: Re: PCRE again (Was: reasons why pike is better than python?)

and how would you implement that in pike?

greeting, martin.

/ Brevbäraren

Martin Nilsson (�skblod) ＠ Pike (-) developers forum

1:10 p.m.

New subject: PCRE again (Was: reasons why pike is better than python?)

...

Please, please, do some benchmarks first :)

So you are saying that I just made up the figures I presented. That's quite an insult.

/ Martin Nilsson (Åskblod)

Previous text:

...

2003-01-29 04:38: Subject: Re: PCRE again (Was: reasons why pike is better than python?)

On Wed, Jan 29, 2003 at 02:10:03AM +0100, Martin Nilsson (Åskblod) @ Pike (-) developers forum wrote:

...
...
a = a[1..]; a += ({ some_value });

This is quite slow, especially for big arrays :)

And mappings would help here, or what is your point? That inappropriate use of a datatype is slower than using the right one (ADT.Queue)?

ADT.Queue is using arrays to implemet queues anyway, where is the advantage?

...
But that is wrong. Running regexp tests written purely in C using the PCRE library is 13% slower than the same tests run in Perl.

??? Just tested:

aldem@fort:/old/home/aldem/src/pcre-3.9$ time ./perltest testdata/testinput1 > /dev/null

real 0m2.194s user 0m2.190s sys 0m0.010s aldem@fort:/old/home/aldem/src/pcre-3.9$ time ./pcretest testdata/testinput1 > /dev/null

real 0m0.065s user 0m0.040s sys 0m0.020s

So where is the Perl's advantage? I did similar benchmars long time ago too, results were very similar (this test was with PCRE 3.9 and Perl 5.6.1).

...
Java, which is 22% slower than Python. To call Pike regexps slow is simply not true.

For some particular tasks - may be, but they are less powerful. And, again, PCRE gives us Perl compatibily, which may simplify porting from Perl (and anyway PCRE is in use in PHP/Apache etc - a lot of places).

I may add similar tests functionality using my PCRE module for Pike and publish the results, if someone is interested...

...
And to think that the performance is going to improve without doing something drastic, like writing a new regexp engine from scratch, is to be unrealistic IMHO.

Please, please, do some benchmarks first :)

Regards, /Al

/ Brevbäraren

Alexander Demenshin

1:32 p.m.

New subject: PCRE again (Was: reasons why pike is better than python?)

On Wed, Jan 29, 2003 at 02:10:03PM +0100, Martin Nilsson (�skblod) @ Pike (-) developers forum wrote:

...

So you are saying that I just made up the figures I presented. That's quite an insult.

As you can see, your figures are a bit... hmmm... different to benchmarking I just did. But I've no idea where you got your figures, it might happen that you took those somewhere, or quite long time ago, etc...

As of me, before I say that someting is faster or slower I'll make a benchmark personally, or I won't say that...

No offense :)

Regards, /Al

Martin Baehr

1:32 p.m.

New subject: PCRE again (Was: reasons why pike is better than python?)

On Wed, Jan 29, 2003 at 02:32:24PM +0100, Alexander Demenshin wrote:

...

On Wed, Jan 29, 2003 at 02:10:03PM +0100, Martin Nilsson (Åskblod) @ Pike (-) developers forum wrote: But I've no idea where you got your figures, it might happen that you took those somewhere, or quite long time ago, etc...

ok, how about we just stop making stupid assumptions and everybody simply show the code used to make create the numbers, then everybody else can verify them, and we can get some conclusive results.

greetings, martin.

Martin Nilsson (�skblod) ＠ Pike (-) developers forum

1:35 p.m.

New subject: PCRE again (Was: reasons why pike is better than python?)

My figures are from the language shootout. One year old though.

/ Martin Nilsson (Åskblod)

Previous text:

...

2003-01-29 14:33: Subject: Re: PCRE again (Was: reasons why pike is better than python?)

On Wed, Jan 29, 2003 at 02:10:03PM +0100, Martin Nilsson (Åskblod) @ Pike (-) developers forum wrote:

...
So you are saying that I just made up the figures I presented. That's quite an insult.

As you can see, your figures are a bit... hmmm... different to benchmarking I just did. But I've no idea where you got your figures, it might happen that you took those somewhere, or quite long time ago, etc...

As of me, before I say that someting is faster or slower I'll make a benchmark personally, or I won't say that...

No offense :)

Regards, /Al

/ Brevbäraren

Alexander Demenshin

2:21 p.m.

New subject: PCRE again (Was: reasons why pike is better than python?)

On Wed, Jan 29, 2003 at 02:35:02PM +0100, Martin Nilsson (�skblod) @ Pike (-) developers forum wrote:

...

My figures are from the language shootout. One year old though.

This one? http://www.bagley.org/~doug/shootout/bench/regexmatch/

Tests are a bit incorrect - single pattern for very specific data set. For this particular case the results are correct, sure. But in average and for more complex patterns... No way.

Again, PCRE offers far more functionality and matching rules, some of them may not be implemented in Pike's RE but are very useful.

Regards, /Al

Alexander Demenshin

3:52 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

On Wed, Jan 29, 2003 at 02:35:02PM +0100, Martin Nilsson (�skblod) @ Pike (-) developers forum wrote:

...

My figures are from the language shootout. One year old though.

Here is the results of my benchmark against real data set with relatively complex (and real-life) patterns (Pike v7.4.10):

=== SNIP === aldem@fort:~/pike/Squid$ time pike -DUSE_PCRE -M.. re-bench.pike 98.86% (eta 0m00s) 1293134/ 16100 Total of 1307519 lines, 16403 rejects

real 1m26.133s user 1m23.430s sys 0m2.330s

aldem@fort:~/pike/Squid$ time pike -M.. re-bench.pike 99.82% (eta 0m00s) 1305263/ 16335 Total of 1307519 lines, 16403 rejects

real 2m43.648s user 2m38.840s sys 0m3.040s === SNIP ===

As you could see, the PCRE is almost two times faster. I run this over my Squid log file, so data are pretty real, and volume is also good enough for testing.

The source: http://aldem.net/pike/re-bench.pike - actually this is stripped out from my squid log analyzer and simplified - only REs are benchmarked.

I used my own PCRE module which I wrote long time ago, in case if someone is interested I'll publish it (though, it has no autoconf support and a bit "raw" - I didn't touch it for long time already).

Regards, /Al

Martin Nilsson (�skblod) ＠ Pike (-) developers forum

4:15 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

Excellent! Now, what are the problems if we simply replaced the current regexp engine with PCRE? What problems do we face in the Pike API?

/ Martin Nilsson (Åskblod)

Previous text:

...

2003-01-29 16:53: Subject: Real-life PCRE vs Regexp benchmarking

On Wed, Jan 29, 2003 at 02:35:02PM +0100, Martin Nilsson (Åskblod) @ Pike (-) developers forum wrote:

...
My figures are from the language shootout. One year old though.

Here is the results of my benchmark against real data set with relatively complex (and real-life) patterns (Pike v7.4.10):

=== SNIP === aldem@fort:~/pike/Squid$ time pike -DUSE_PCRE -M.. re-bench.pike 98.86% (eta 0m00s) 1293134/ 16100 Total of 1307519 lines, 16403 rejects

real 1m26.133s user 1m23.430s sys 0m2.330s

aldem@fort:~/pike/Squid$ time pike -M.. re-bench.pike 99.82% (eta 0m00s) 1305263/ 16335 Total of 1307519 lines, 16403 rejects

real 2m43.648s user 2m38.840s sys 0m3.040s === SNIP ===

As you could see, the PCRE is almost two times faster. I run this over my Squid log file, so data are pretty real, and volume is also good enough for testing.

The source: http://aldem.net/pike/re-bench.pike - actually this is stripped out from my squid log analyzer and simplified - only REs are benchmarked.

I used my own PCRE module which I wrote long time ago, in case if someone is interested I'll publish it (though, it has no autoconf support and a bit "raw" - I didn't touch it for long time already).

Regards, /Al

/ Brevbäraren

Alexander Demenshin

4:29 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

On Wed, Jan 29, 2003 at 05:15:02PM +0100, Martin Nilsson (�skblod) @ Pike (-) developers forum wrote:

...

Excellent! Now, what are the problems if we simply replaced the current regexp engine with PCRE? What problems do we face in the Pike API?

Actually, it should not be a problem - but I didn't studied Pike's Regexp syntax deeply so it _might_ be that some incomatibility issues will arise.

At least, a lot of features in PCRE are not present in Regexp - so one cannot use PCRE and then easily switch to Regexp.

I don't know about module from PExts, but my own API was modelled after Regexp API (only one exception which currently makes it incompatile - split() retuns the complete match as element 0, first subpattern - element 1 etc, this could be easily fixed though).

Regards, /Al

David Hedbor ＠ Pike developers forum

7:20 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

The pexts module is API compatible with the current Pike regexp module as well. Caudium uses it automatically if available and if not uses the standard Pike module.

/ David Hedbor

Previous text:

...

2003-01-29 17:30: Subject: Re: Real-life PCRE vs Regexp benchmarking

On Wed, Jan 29, 2003 at 05:15:02PM +0100, Martin Nilsson (Åskblod) @ Pike (-) developers forum wrote:

...
Excellent! Now, what are the problems if we simply replaced the current regexp engine with PCRE? What problems do we face in the Pike API?

Actually, it should not be a problem - but I didn't studied Pike's Regexp syntax deeply so it _might_ be that some incomatibility issues will arise.

At least, a lot of features in PCRE are not present in Regexp - so one cannot use PCRE and then easily switch to Regexp.

I don't know about module from PExts, but my own API was modelled after Regexp API (only one exception which currently makes it incompatile - split() retuns the complete match as element 0, first subpattern - element 1 etc, this could be easily fixed though).

Regards, /Al

/ Brevbäraren

Martin Stjernholm, Roxen IS ＠ Pike developers forum

30 Jan 30 Jan

12:25 a.m.

New subject: Real-life PCRE vs Regexp benchmarking

The regexp engines in Perl and Pike are derived from the same original source that Henry Spencer wrote back in -86. So given that PCRE should be Perl compatible, there's a good chance that they are compatible.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-01-29 17:30: Subject: Re: Real-life PCRE vs Regexp benchmarking

On Wed, Jan 29, 2003 at 05:15:02PM +0100, Martin Nilsson (Åskblod) @ Pike (-) developers forum wrote:

...
Excellent! Now, what are the problems if we simply replaced the current regexp engine with PCRE? What problems do we face in the Pike API?

Actually, it should not be a problem - but I didn't studied Pike's Regexp syntax deeply so it _might_ be that some incomatibility issues will arise.

At least, a lot of features in PCRE are not present in Regexp - so one cannot use PCRE and then easily switch to Regexp.

I don't know about module from PExts, but my own API was modelled after Regexp API (only one exception which currently makes it incompatile - split() retuns the complete match as element 0, first subpattern - element 1 etc, this could be easily fixed though).

Regards, /Al

/ Brevbäraren

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

29 Jan 29 Jan

4:30 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

If we intend this to be a permanent solution, the fact that PCRE probably doesn't handle wide strings is a problem.

If we intend it as just temporary speedup, we should be wary about allowing a wider set of regexp syntax, since we would have to allow the same set when a permanent solution is fixed in order to have backwards compatibility.

Other than that I can't think of any problems. After all, the API is pretty simlistic as it is.

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Previous text:

...

2003-01-29 17:10: Subject: Real-life PCRE vs Regexp benchmarking

Excellent! Now, what are the problems if we simply replaced the current regexp engine with PCRE? What problems do we face in the Pike API?

/ Martin Nilsson (Åskblod)

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

4:30 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

I'm assuming somebody has already verified that it's OK license-wise to distribute PCRE with Pike.

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Previous text:

...

2003-01-29 17:26: Subject: Real-life PCRE vs Regexp benchmarking

If we intend this to be a permanent solution, the fact that PCRE probably doesn't handle wide strings is a problem.

If we intend it as just temporary speedup, we should be wary about allowing a wider set of regexp syntax, since we would have to allow the same set when a permanent solution is fixed in order to have backwards compatibility.

Other than that I can't think of any problems. After all, the API is pretty simlistic as it is.

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Alexander Demenshin

4:34 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

On Wed, Jan 29, 2003 at 05:30:04PM +0100, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:

...

I'm assuming somebody has already verified that it's OK license-wise to distribute PCRE with Pike.

From PCRE license:

4. If PCRE is embedded in any software that is released under the GNU General Purpose Licence (GPL), or Lesser General Purpose Licence (LGPL), then the terms of that licence shall supersede any condition above with which it is incompatible.

So this is not a problem.

Regards, /Al

Peter Bortas ＠ Pike developers forum

4:40 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

1, 2 and 3 is still vital then. MPL.

/ Peter Bortas

Previous text:

...

2003-01-29 17:35: Subject: Re: Real-life PCRE vs Regexp benchmarking

On Wed, Jan 29, 2003 at 05:30:04PM +0100, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:

...
I'm assuming somebody has already verified that it's OK license-wise to distribute PCRE with Pike.

From PCRE license:

If PCRE is embedded in any software that is released under the GNU

General Purpose Licence (GPL), or Lesser General Purpose Licence (LGPL), then the terms of that licence shall supersede any condition above with which it is incompatible.

So this is not a problem.

Regards, /Al

/ Brevbäraren

Alexander Demenshin

4:45 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

On Wed, Jan 29, 2003 at 05:40:07PM +0100, Peter Bortas @ Pike developers forum wrote:

...

1, 2 and 3 is still vital then. MPL.

Sorry?

...

...

If PCRE is embedded in any software that is released under the GNU

General Purpose Licence (GPL), or Lesser General Purpose Licence (LGPL), then the terms of that licence shall supersede any condition above with

=============================================================

...

...
which it is incompatible.

The GPL/LGPL shall supersede, or did I get it wrong? Pike is distributed under GPL/LGPL, as I know.

Regards, /Al

Per Hedbor () ＠ Pike (-) developers forum

4:55 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

MPL/GPL/LGPL, note MPL.

/ Per Hedbor ()

Previous text:

...

2003-01-29 17:46: Subject: Re: Real-life PCRE vs Regexp benchmarking

On Wed, Jan 29, 2003 at 05:40:07PM +0100, Peter Bortas @ Pike developers forum wrote:

...
1, 2 and 3 is still vital then. MPL.

Sorry?

...
...

If PCRE is embedded in any software that is released under the GNU

General Purpose Licence (GPL), or Lesser General Purpose Licence (LGPL), then the terms of that licence shall supersede any condition above with
  =============================================================
...
...
which it is incompatible.

The GPL/LGPL shall supersede, or did I get it wrong? Pike is distributed under GPL/LGPL, as I know.

Regards, /Al

/ Brevbäraren

Alexander Demenshin

5:03 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

On Wed, Jan 29, 2003 at 05:55:02PM +0100, Per Hedbor () @ Pike (-) developers forum wrote:

...

MPL/GPL/LGPL, note MPL.

OK. Anyway it is not necessary to distribute it with Pike (bundled). And I think that the Author of PCRE might make changes if we ask.

Regards, /Al

Per Hedbor () ＠ Pike (-) developers forum

5:10 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

...

OK. Anyway it is not necessary to distribute it with Pike (bundled). And I think that the Author of PCRE might make changes if we ask.

It is actually more or less nessesary, I don't think it would be a good idea to have one pike version that can have totally different regexp engines.

And if the licenses are compatible, it's not even nessesary to change the license.

And, also, I really do think that it's best to have int32 as the basic type, perhaps compiled like the lexer, though, with different versions for size_shift 0, 1 and 2. Then you don't loose any performance, and get wide-string support.

/ Per Hedbor ()

Previous text:

...

2003-01-29 18:04: Subject: Re: Real-life PCRE vs Regexp benchmarking

On Wed, Jan 29, 2003 at 05:55:02PM +0100, Per Hedbor () @ Pike (-) developers forum wrote:

...
MPL/GPL/LGPL, note MPL.

OK. Anyway it is not necessary to distribute it with Pike (bundled). And I think that the Author of PCRE might make changes if we ask.

Regards, /Al

/ Brevbäraren

Per Hedbor () ＠ Pike (-) developers forum

5:15 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

Converting to and from UTF-8 is anything but light-weight, also, accessing and matching in UTF-8 strings is not O(1).

/ Per Hedbor ()

Previous text:

...

2003-01-29 18:09: Subject: Re: Real-life PCRE vs Regexp benchmarking

...
OK. Anyway it is not necessary to distribute it with Pike (bundled). And I think that the Author of PCRE might make changes if we ask.

It is actually more or less nessesary, I don't think it would be a good idea to have one pike version that can have totally different regexp engines.

And if the licenses are compatible, it's not even nessesary to change the license.

And, also, I really do think that it's best to have int32 as the basic type, perhaps compiled like the lexer, though, with different versions for size_shift 0, 1 and 2. Then you don't loose any performance, and get wide-string support.

/ Per Hedbor ()

Alexander Demenshin

5:41 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

On Wed, Jan 29, 2003 at 06:10:04PM +0100, Per Hedbor () @ Pike (-) developers forum wrote:

...

And, also, I really do think that it's best to have int32 as the basic type, perhaps compiled like the lexer, though, with different versions for size_shift 0, 1 and 2. Then you don't loose any performance, and get wide-string support.

We don't have wide-string support in current Regexp anyway, it is even not binary-aware.

And I believe that wide-string support should be compile-time option - it would be an overkill to have RE engine with int32 as basic char type, which will be used to handle normal strings (we have to convert normal strings to wide-strings first then - this will degrade performance and increase memory usage).

Regards, /Al

Per Hedbor () ＠ Pike (-) developers forum

5:50 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

...

And I believe that wide-string support should be compile-time option - it would be an overkill to have RE engine with int32 as basic char type, which will be used to handle normal strings (we

...
have to convert normal strings to wide-strings first then - this

will degrade performance and increase memory usage).

No. Read my proposal again. No need to convert anything. There is a slight binary size overhead, but probably not all that much.

And all strings in pike are wide-strings.

If it's easy to add support for it, why not do it?

/ Per Hedbor ()

Previous text:

...

2003-01-29 18:42: Subject: Re: Real-life PCRE vs Regexp benchmarking

On Wed, Jan 29, 2003 at 06:10:04PM +0100, Per Hedbor () @ Pike (-) developers forum wrote:

...
And, also, I really do think that it's best to have int32 as the basic type, perhaps compiled like the lexer, though, with different versions for size_shift 0, 1 and 2. Then you don't loose any performance, and get wide-string support.

We don't have wide-string support in current Regexp anyway, it is even not binary-aware.

And I believe that wide-string support should be compile-time option - it would be an overkill to have RE engine with int32 as basic char type, which will be used to handle normal strings (we have to convert normal strings to wide-strings first then - this will degrade performance and increase memory usage).

Regards, /Al

/ Brevbäraren

Xavier Beaudouin

6:27 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

Le mercredi, 29 jan 2003, à 17:45 Europe/Paris, Alexander Demenshin a écrit :

...

On Wed, Jan 29, 2003 at 05:40:07PM +0100, Peter Bortas @ Pike developers forum wrote:

...
1, 2 and 3 is still vital then. MPL.

Sorry?

...
...

If PCRE is embedded in any software that is released under the GNU

General Purpose Licence (GPL), or Lesser General Purpose Licence (LGPL), then the terms of that licence shall supersede any condition above with
  =============================================================
...
...
which it is incompatible.

The GPL/LGPL shall supersede, or did I get it wrong? Pike is distributed under GPL/LGPL, as I know.

Does this licensing issues are *really* a problem to add some code that can run with PCRE ?

It can for distributing binaries, but about sources distribution, I don't think so....

And after all if it anoy us to distribute pike with PCRE distribution, then --without-pcre will fix that...

/Xavier

-- Xavier Beaudouin - Unix System Administrator & Projects Leader. Please visit http://caudium.net/, home of Caudium & Camas projects O ascii ribbon campaign against html email |\ and Microsoft attachments

Peter Bortas ＠ Pike developers forum

6:40 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

Something that can't be put in binary dists is not an option. I haven't heard anyone say that that's the case yet though.

/ Peter Bortas

Previous text:

...

2003-01-29 19:29: Subject: Re: Real-life PCRE vs Regexp benchmarking

Le mercredi, 29 jan 2003, à 17:45 Europe/Paris, Alexander Demenshin a écrit :

...
On Wed, Jan 29, 2003 at 05:40:07PM +0100, Peter Bortas @ Pike developers forum wrote:

...
1, 2 and 3 is still vital then. MPL.

Sorry?

...
...

If PCRE is embedded in any software that is released under the GNU

General Purpose Licence (GPL), or Lesser General Purpose Licence (LGPL), then the terms of that licence shall supersede any condition above with
  =============================================================
...
...
which it is incompatible.

The GPL/LGPL shall supersede, or did I get it wrong? Pike is distributed under GPL/LGPL, as I know.
Does this licensing issues are *really* a problem to add some code that can run with PCRE ?

It can for distributing binaries, but about sources distribution, I don't think so....

And after all if it anoy us to distribute pike with PCRE distribution, then --without-pcre will fix that...

/Xavier

-- Xavier Beaudouin - Unix System Administrator & Projects Leader. Please visit http://caudium.net/, home of Caudium & Camas projects O ascii ribbon campaign against html email |\ and Microsoft attachments

/ Brevbäraren

Alexander Demenshin

4:35 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

On Wed, Jan 29, 2003 at 05:30:04PM +0100, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:

...

If we intend this to be a permanent solution, the fact that PCRE probably doesn't handle wide strings is a problem.

It has some support for UTF-8, and basically it is 8bit clean. This part is not well tested, though.

Regards, /Al

Per Hedbor () ＠ Pike (-) developers forum

4:40 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

And what happens with the performance if all strings are converted to UTF-8 and then back again?

I think a better alternative might be to attempt to change 'char' to 'int32' in the PCRE-code.

/ Per Hedbor ()

Previous text:

...

2003-01-29 17:36: Subject: Re: Real-life PCRE vs Regexp benchmarking

On Wed, Jan 29, 2003 at 05:30:04PM +0100, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:

...
If we intend this to be a permanent solution, the fact that PCRE probably doesn't handle wide strings is a problem.

It has some support for UTF-8, and basically it is 8bit clean. This part is not well tested, though.

Regards, /Al

/ Brevbäraren

Alexander Demenshin

4:53 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

On Wed, Jan 29, 2003 at 05:40:06PM +0100, Per Hedbor () @ Pike (-) developers forum wrote:

...

I think a better alternative might be to attempt to change 'char' to 'int32' in the PCRE-code.

This will not improve performance either :)

But what I found in Pike docs (http://pike.ida.liu.se/generated/manual/modref/ex/predef_3A_3A/Regexp/match....):

"The current implementation (Pike 7.3.51) doesn't support searching in strings containing the NUL character or any wide character."

I guess this is still true for latest Pike. So we lose nothing.

Actually, we can do UTF-8 and binray searching with PCRE - while not with Regexp.

Regards, /Al

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

5:25 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

...

...
I think a better alternative might be to attempt to change 'char' to 'int32' in the PCRE-code.

This will not improve performance either :)

Actually, it might. Accessing int32s is typically faster than accessing bytes on modern architectures. Data buses aren't 8bit anymore...

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Previous text:

...

2003-01-29 17:54: Subject: Re: Real-life PCRE vs Regexp benchmarking

On Wed, Jan 29, 2003 at 05:40:06PM +0100, Per Hedbor () @ Pike (-) developers forum wrote:

...
I think a better alternative might be to attempt to change 'char' to 'int32' in the PCRE-code.

This will not improve performance either :)

But what I found in Pike docs (http://pike.ida.liu.se/generated/manual/modref/ex/predef_3A_3A/Regexp/match....):

"The current implementation (Pike 7.3.51) doesn't support searching in strings containing the NUL character or any wide character."

I guess this is still true for latest Pike. So we lose nothing.

Actually, we can do UTF-8 and binray searching with PCRE - while not with Regexp.

Regards, /Al

/ Brevbäraren

Per Hedbor () ＠ Pike (-) developers forum

5:45 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

...

Actually, it might. Accessing int32s is typically faster than accessing bytes on modern architectures. Data buses aren't 8bit anymore...

The pentium 4 has a lot of ingenious hardware to ensure that accessing 4 8bit values (in sequence) is just as fast as accessing a 32bit value.

However, this is probably not true for most other architectures.

/ Per Hedbor ()

Previous text:

...

2003-01-29 18:23: Subject: Re: Real-life PCRE vs Regexp benchmarking

...
...
I think a better alternative might be to attempt to change 'char' to 'int32' in the PCRE-code.

This will not improve performance either :)

Actually, it might. Accessing int32s is typically faster than accessing bytes on modern architectures. Data buses aren't 8bit anymore...

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Martin Stjernholm, Roxen IS ＠ Pike developers forum

30 Jan 30 Jan

12:35 a.m.

New subject: Real-life PCRE vs Regexp benchmarking

If the permanent solution would be my project (which still is far from production quality), I don't think the syntax or the feature set will be a problem since Perl regexps are among goals I aim for.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-01-29 17:26: Subject: Real-life PCRE vs Regexp benchmarking

If we intend this to be a permanent solution, the fact that PCRE probably doesn't handle wide strings is a problem.

If we intend it as just temporary speedup, we should be wary about allowing a wider set of regexp syntax, since we would have to allow the same set when a permanent solution is fixed in order to have backwards compatibility.

Other than that I can't think of any problems. After all, the API is pretty simlistic as it is.

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Martin Nilsson (�skblod) ＠ Pike (-) developers forum

12:45 a.m.

New subject: Real-life PCRE vs Regexp benchmarking

I would find it very interesting to get a progress report of your regexp work. Last time I heard something was perhaps a year ago. You then claimed to have "all but one or two" of the problems solved in the theoretical model. Is the design phase over and you are now in the prototype phase? When do you expect to get your Ph.D. for your work?

/ Martin Nilsson (Åskblod)

Previous text:

...

2003-01-30 01:32: Subject: Real-life PCRE vs Regexp benchmarking

If the permanent solution would be my project (which still is far from production quality), I don't think the syntax or the feature set will be a problem since Perl regexps are among goals I aim for.

/ Martin Stjernholm, Roxen IS

Martin Stjernholm, Roxen IS ＠ Pike developers forum

1:05 a.m.

New subject: Real-life PCRE vs Regexp benchmarking

I've stopped coding on the prototype and is trying to write a report about it, something that is proceeding rather slowly. Since that statement I've solved about three or four of the two design problems I guess I meant then, and I still got two left. But in spite of the slow progress I'm confident that the approach is a good one and that there will be a decent result eventually.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-01-30 01:43: Subject: Real-life PCRE vs Regexp benchmarking

I would find it very interesting to get a progress report of your regexp work. Last time I heard something was perhaps a year ago. You then claimed to have "all but one or two" of the problems solved in the theoretical model. Is the design phase over and you are now in the prototype phase? When do you expect to get your Ph.D. for your work?

/ Martin Nilsson (Åskblod)

Leif Stensson, Lysator ＠ Pike developers forum

2:25 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

Side note: Larry Wall has been heard to speculate seriously about revising the Perl regexp syntax for Perl version 6, partly to make it more readable, and partly to give it more expressive power.

/ Leif Stensson, Lysator

Previous text:

...

2003-01-30 01:32: Subject: Real-life PCRE vs Regexp benchmarking

If the permanent solution would be my project (which still is far from production quality), I don't think the syntax or the feature set will be a problem since Perl regexps are among goals I aim for.

/ Martin Stjernholm, Roxen IS

Johan Sundstr�m (a hugging punishment!) ＠ Pike (-) developers forum

2:25 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

Wouldn't readability sort of defeat the whole purpose of perl? :-)

/ Johan Sundström (a hugging punishment!)

Previous text:

...

2003-01-30 15:22: Subject: Real-life PCRE vs Regexp benchmarking

Side note: Larry Wall has been heard to speculate seriously about revising the Perl regexp syntax for Perl version 6, partly to make it more readable, and partly to give it more expressive power.

/ Leif Stensson, Lysator

Martin Baehr

2:26 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

lol! thanks, you are getting the whole office here laughing :-)

greetings, martin.

Johan Sundstr�m (a hugging punishment!) ＠ Pike (-) developers forum

2:55 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

I was inspired by an old favourite Erik Naggum quote - "It's not that perl programmers are idiots, it's that the language rewards idiotic behavior in a way that no other language or tool has ever done." ;-)

/ Johan Sundström (a hugging punishment!)

Previous text:

...

2003-01-30 15:29: Subject: Re: Real-life PCRE vs Regexp benchmarking

lol! thanks, you are getting the whole office here laughing :-)

greetings, martin.

/ Brevbäraren

Peter Lundqvist (disjunkt) ＠ Pike (-) developers forum

2:55 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

*adds to cookie-collection*

/ Peter Lundqvist (disjunkt)

Previous text:

...

2003-01-30 15:51: Subject: Re: Real-life PCRE vs Regexp benchmarking

I was inspired by an old favourite Erik Naggum quote - "It's not that perl programmers are idiots, it's that the language rewards idiotic behavior in a way that no other language or tool has ever done." ;-)

/ Johan Sundström (a hugging punishment!)

Dan Nelson

6:49 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

In the last episode (Jan 30), Peter Lundqvist (disjunkt) @ Pike (-) developers forum said:

...

Johan Sundström:

...
I was inspired by an old favourite Erik Naggum quote - "It's not that perl programmers are idiots, it's that the language rewards idiotic behavior in a way that no other language or tool has ever done." ;-)

*adds to cookie-collection*

A little googling found the source post:

http://groups.google.com/groups?as_umsgid=3163193555464012@naggum.no

Lots of truth in there :)

-- Dan Nelson dnelson@allantgroup.com

Leif Stensson, Lysator ＠ Pike developers forum

2:30 p.m.

New subject: Real-life PCRE vs Regexp benchmarking

Not as long as it is still fairly concise.

/ Leif Stensson, Lysator

Previous text:

...

2003-01-30 15:24: Subject: Real-life PCRE vs Regexp benchmarking

Wouldn't readability sort of defeat the whole purpose of perl? :-)

/ Johan Sundström (a hugging punishment!)

David Hedbor ＠ Pike developers forum

29 Jan 29 Jan

7:15 p.m.

New subject: PCRE again (Was: reasons why pike is better than python?)

...

I may add similar tests functionality using my PCRE module for Pike and publish the results, if someone is interested...

Uh. You have your own PCRE Pike module? Gah! :-) It's been in PEXts for years. Pike really needs a good module repository for sure.

/ David Hedbor

Previous text:

...

2003-01-29 04:38: Subject: Re: PCRE again (Was: reasons why pike is better than python?)

On Wed, Jan 29, 2003 at 02:10:03AM +0100, Martin Nilsson (Åskblod) @ Pike (-) developers forum wrote:

...
...
a = a[1..]; a += ({ some_value });

This is quite slow, especially for big arrays :)

And mappings would help here, or what is your point? That inappropriate use of a datatype is slower than using the right one (ADT.Queue)?

ADT.Queue is using arrays to implemet queues anyway, where is the advantage?

...
But that is wrong. Running regexp tests written purely in C using the PCRE library is 13% slower than the same tests run in Perl.

??? Just tested:

aldem@fort:/old/home/aldem/src/pcre-3.9$ time ./perltest testdata/testinput1 > /dev/null

real 0m2.194s user 0m2.190s sys 0m0.010s aldem@fort:/old/home/aldem/src/pcre-3.9$ time ./pcretest testdata/testinput1 > /dev/null

real 0m0.065s user 0m0.040s sys 0m0.020s

So where is the Perl's advantage? I did similar benchmars long time ago too, results were very similar (this test was with PCRE 3.9 and Perl 5.6.1).

...
Java, which is 22% slower than Python. To call Pike regexps slow is simply not true.

For some particular tasks - may be, but they are less powerful. And, again, PCRE gives us Perl compatibily, which may simplify porting from Perl (and anyway PCRE is in use in PHP/Apache etc - a lot of places).

I may add similar tests functionality using my PCRE module for Pike and publish the results, if someone is interested...

...
And to think that the performance is going to improve without doing something drastic, like writing a new regexp engine from scratch, is to be unrealistic IMHO.

Please, please, do some benchmarks first :)

Regards, /Al

/ Brevbäraren

Alexander Demenshin

8:08 p.m.

New subject: PCRE again (Was: reasons why pike is better than python?)

On Wed, Jan 29, 2003 at 08:15:10PM +0100, David Hedbor @ Pike developers forum wrote:

...

Uh. You have your own PCRE Pike module? Gah! :-) It's been in PEXts for years. Pike really needs a good module repository for sure.

When I wrote it first time there was no Caudium or PExts yet :)

Regards, /Al

David Hedbor ＠ Pike developers forum

8:25 p.m.

New subject: PCRE again (Was: reasons why pike is better than python?)

Dude, that's a long time ago. :-P

/ David Hedbor

Previous text:

...

2003-01-29 21:09: Subject: Re: PCRE again (Was: reasons why pike is better than python?)

On Wed, Jan 29, 2003 at 08:15:10PM +0100, David Hedbor @ Pike developers forum wrote:

...
Uh. You have your own PCRE Pike module? Gah! :-) It's been in PEXts for years. Pike really needs a good module repository for sure.

When I wrote it first time there was no Caudium or PExts yet :)

Regards, /Al

/ Brevbäraren

Martin Stjernholm, Roxen IS ＠ Pike developers forum

30 Jan 30 Jan

1:05 a.m.

New subject: PCRE again (Was: reasons why pike is better than python?)

...

ADT.Queue is using arrays to implemet queues anyway, where is the advantage?

You mean: "ADT.Queue is using arrays to implement queues anyway, so I'll fix a better implementation." ;)

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-01-29 04:38: Subject: Re: PCRE again (Was: reasons why pike is better than python?)

On Wed, Jan 29, 2003 at 02:10:03AM +0100, Martin Nilsson (Åskblod) @ Pike (-) developers forum wrote:

...
...
a = a[1..]; a += ({ some_value });

This is quite slow, especially for big arrays :)

And mappings would help here, or what is your point? That inappropriate use of a datatype is slower than using the right one (ADT.Queue)?

ADT.Queue is using arrays to implemet queues anyway, where is the advantage?

...
But that is wrong. Running regexp tests written purely in C using the PCRE library is 13% slower than the same tests run in Perl.

??? Just tested:

aldem@fort:/old/home/aldem/src/pcre-3.9$ time ./perltest testdata/testinput1 > /dev/null

real 0m2.194s user 0m2.190s sys 0m0.010s aldem@fort:/old/home/aldem/src/pcre-3.9$ time ./pcretest testdata/testinput1 > /dev/null

real 0m0.065s user 0m0.040s sys 0m0.020s

So where is the Perl's advantage? I did similar benchmars long time ago too, results were very similar (this test was with PCRE 3.9 and Perl 5.6.1).

...
Java, which is 22% slower than Python. To call Pike regexps slow is simply not true.

For some particular tasks - may be, but they are less powerful. And, again, PCRE gives us Perl compatibily, which may simplify porting from Perl (and anyway PCRE is in use in PHP/Apache etc - a lot of places).

I may add similar tests functionality using my PCRE module for Pike and publish the results, if someone is interested...

...
And to think that the performance is going to improve without doing something drastic, like writing a new regexp engine from scratch, is to be unrealistic IMHO.

Please, please, do some benchmarks first :)

Regards, /Al

/ Brevbäraren

Alexander Demenshin

2:24 a.m.

New subject: hybrid of array & list

On Thu, Jan 30, 2003 at 02:05:03AM +0100, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:

...

You mean: "ADT.Queue is using arrays to implement queues anyway, so I'll fix a better implementation." ;)

Well, I am slowly thinking about some hybride between lists and arrays, so I can use list->next(item)/list->prev(item) and indexes as well.

Actually, I need data type which will allow push/pop/shift (both FIFO & LIFO), destructive modification (by object reference, by index or index range) and access to the data by index. This data type should be quick enough when pushing/popping/shifting or iterating, and it should be possible to get an item which is next/previous to the specified.

Methods that I want at least (I hope purpose is obvious):

There is no equivalent implemented already (in Pike), so I've no choice but to implement it myself (of course in C - Pike is soooo slow) :)

Regards, /Al

David Hedbor ＠ Pike developers forum

2:45 a.m.

New subject: hybrid of array & list

Do so and it might be added to Pike. Some more container types in Pike wouldn't be such a bad thing. The main issue is that these classes wouldn't be as easy to handle as the builtins from say Pike modules written in C, not to mention usage in "standard" methods (search / has_value for example).

Unless, of course, they are generalized like STL and Java (in that they don't require specific types, just specific characteristics such as "random seekable", "iterable" and such).

/ David Hedbor

Previous text:

...

2003-01-30 03:25: Subject: Re: hybrid of array & list

On Thu, Jan 30, 2003 at 02:05:03AM +0100, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:

...
You mean: "ADT.Queue is using arrays to implement queues anyway, so I'll fix a better implementation." ;)

Well, I am slowly thinking about some hybride between lists and arrays, so I can use list->next(item)/list->prev(item) and indexes as well.

Actually, I need data type which will allow push/pop/shift (both FIFO & LIFO), destructive modification (by object reference, by index or index range) and access to the data by index. This data type should be quick enough when pushing/popping/shifting or iterating, and it should be possible to get an item which is next/previous to the specified.

Methods that I want at least (I hope purpose is obvious):

mixed push(mixed item, mixed|int|void pos) mixed pop(int|void count, int|void pos) mixed peek(int|void pos) void delete(mixed|int start, mixed|int|void end) mixed `[](mixed|int pos) mixed next(mixed pos) mixed prev(mixed pos)

There is no equivalent implemented already (in Pike), so I've no choice but to implement it myself (of course in C - Pike is soooo slow) :)

Regards, /Al

/ Brevbäraren

Peta, jo det �r jag ＠ Pike developers forum

9 a.m.

New subject: hybrid of array & list

I'm curretly working on an extension of the ADT module that will result in at least a circular array and a linked list as well a guidelines for extending the ADT module (could be found at http://w1.313.telia.com/~u31318241/CommonAPI.pdf). I can easly add those methods to the circular array if you want.

/ Peta, jo det är jag

Previous text:

...

2003-01-30 03:25: Subject: Re: hybrid of array & list

On Thu, Jan 30, 2003 at 02:05:03AM +0100, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:

...
You mean: "ADT.Queue is using arrays to implement queues anyway, so I'll fix a better implementation." ;)

Well, I am slowly thinking about some hybride between lists and arrays, so I can use list->next(item)/list->prev(item) and indexes as well.

Actually, I need data type which will allow push/pop/shift (both FIFO & LIFO), destructive modification (by object reference, by index or index range) and access to the data by index. This data type should be quick enough when pushing/popping/shifting or iterating, and it should be possible to get an item which is next/previous to the specified.

Methods that I want at least (I hope purpose is obvious):

mixed push(mixed item, mixed|int|void pos) mixed pop(int|void count, int|void pos) mixed peek(int|void pos) void delete(mixed|int start, mixed|int|void end) mixed `[](mixed|int pos) mixed next(mixed pos) mixed prev(mixed pos)

There is no equivalent implemented already (in Pike), so I've no choice but to implement it myself (of course in C - Pike is soooo slow) :)

Regards, /Al

/ Brevbäraren

Martin Stjernholm, Roxen IS ＠ Pike developers forum

1:25 p.m.

New subject: hybrid of array & list

The new multiset implementation seems suitable for that when used without an ordering function. In that mode it takes O(log n) to insert and remove elements at each end but longer in the middle. With an ordering function it still takes O(log n) (assuming that function is O(1)) but with a higher constant. Stepping is O(1) on the average.

All the C level stuff already exists and appears to be stable, so what's needed is to make it available from Pike. If you aren't up to that, it could be an idea to use the C level functions in a specialized object; it'll still save you a lot of the gory details, such as handling resizing and gc interaction.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-01-30 03:25: Subject: Re: hybrid of array & list

On Thu, Jan 30, 2003 at 02:05:03AM +0100, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:

...
You mean: "ADT.Queue is using arrays to implement queues anyway, so I'll fix a better implementation." ;)

Well, I am slowly thinking about some hybride between lists and arrays, so I can use list->next(item)/list->prev(item) and indexes as well.

Actually, I need data type which will allow push/pop/shift (both FIFO & LIFO), destructive modification (by object reference, by index or index range) and access to the data by index. This data type should be quick enough when pushing/popping/shifting or iterating, and it should be possible to get an item which is next/previous to the specified.

Methods that I want at least (I hope purpose is obvious):

mixed push(mixed item, mixed|int|void pos) mixed pop(int|void count, int|void pos) mixed peek(int|void pos) void delete(mixed|int start, mixed|int|void end) mixed `[](mixed|int pos) mixed next(mixed pos) mixed prev(mixed pos)

There is no equivalent implemented already (in Pike), so I've no choice but to implement it myself (of course in C - Pike is soooo slow) :)

Regards, /Al

/ Brevbäraren

Mirar ＠ Pike developers forum

5:25 p.m.

New subject: hybrid of array & list

Isn't inserting O(1) or at worst O(log n) if you have a pointer to the element you want to insert it after (or before)?

Finding element n is O(n), of course.

/ Mirar

Previous text:

...

2003-01-30 14:23: Subject: Re: hybrid of array & list

The new multiset implementation seems suitable for that when used without an ordering function. In that mode it takes O(log n) to insert and remove elements at each end but longer in the middle. With an ordering function it still takes O(log n) (assuming that function is O(1)) but with a higher constant. Stepping is O(1) on the average.

All the C level stuff already exists and appears to be stable, so what's needed is to make it available from Pike. If you aren't up to that, it could be an idea to use the C level functions in a specialized object; it'll still save you a lot of the gory details, such as handling resizing and gc interaction.

/ Martin Stjernholm, Roxen IS

Martin Stjernholm, Roxen IS ＠ Pike developers forum

5:45 p.m.

New subject: hybrid of array & list

No, to insert or remove an element you need to find its parents, all the way to the root node in the worst case. My multiset implementation doesn't store parent pointers, so this is instead done by recording the path on a stack when a node is found. It works fine as long as there's a good order function, which I consider to be the normal case.

If the order function doesn't order some values (or all in this case) then it can only be used to find the right set of unordered elements. Within that set the implementation has to do a linear search while the stack is kept updated appropriately.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-01-30 18:21: Subject: Re: hybrid of array & list

Isn't inserting O(1) or at worst O(log n) if you have a pointer to the element you want to insert it after (or before)?

Finding element n is O(n), of course.

/ Mirar

Martin Stjernholm, Roxen IS ＠ Pike developers forum

5:55 p.m.

New subject: hybrid of array & list

The path is necessary to rebalance the tree. In this particular case, a tree isn't really what one wants though; a doubly linked list would be better. Even if the tree is replaced with that, I don't think it would be too hard to make it possible to reuse the allocation and gc routines, but I suspect there currently are cases where they'll get upset if it isn't a sufficiently balanced tree (at least if you run with debug turned on).

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-01-30 18:41: Subject: Re: hybrid of array & list

No, to insert or remove an element you need to find its parents, all the way to the root node in the worst case. My multiset implementation doesn't store parent pointers, so this is instead done by recording the path on a stack when a node is found. It works fine as long as there's a good order function, which I consider to be the normal case.

If the order function doesn't order some values (or all in this case) then it can only be used to find the right set of unordered elements. Within that set the implementation has to do a linear search while the stack is kept updated appropriately.

/ Martin Stjernholm, Roxen IS

Alexander Demenshin

6:46 p.m.

New subject: hybrid of array & list

On Thu, Jan 30, 2003 at 06:55:01PM +0100, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:

...

The path is necessary to rebalance the tree. In this particular case, a tree isn't really what one wants though; a doubly linked list would be better.

This is what I think of. DLL and a shadow array which will be used in indexing operations - it will be rebuild if necessary.

Regards, /Al

Mirar ＠ Pike developers forum

6 p.m.

New subject: hybrid of array & list

Ah, I see. But if you have an "index" object, you could keep the stack in it as well. (I assume that is done in a multiset iterator?)

/ Mirar

Previous text:

...

2003-01-30 18:41: Subject: Re: hybrid of array & list

No, to insert or remove an element you need to find its parents, all the way to the root node in the worst case. My multiset implementation doesn't store parent pointers, so this is instead done by recording the path on a stack when a node is found. It works fine as long as there's a good order function, which I consider to be the normal case.

If the order function doesn't order some values (or all in this case) then it can only be used to find the right set of unordered elements. Within that set the implementation has to do a linear search while the stack is kept updated appropriately.

/ Martin Stjernholm, Roxen IS

Martin Stjernholm, Roxen IS ＠ Pike developers forum

6:10 p.m.

New subject: hybrid of array & list

Actually no, the multiset iterator currently doesn't keep a stack since it allows the multiset to be changed without loosing track. It'd be possible to store the stack if it locks the index part of the multiset, like the mapping iterator. It wouldn't be difficult to extend it with an option to allow that mode of operation. Maybe it should even be the default mode?

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-01-30 18:58: Subject: Re: hybrid of array & list

Ah, I see. But if you have an "index" object, you could keep the stack in it as well. (I assume that is done in a multiset iterator?)

/ Mirar

Mirar ＠ Pike developers forum

6:30 p.m.

New subject: hybrid of array & list

Hm, *ponders* Doesn't it do rather well without it, though?

I think an iteration should iterate over the dataset as it were at the point of the iteration initialization. If that isn't possible, at least the same index set.

/ Mirar

Previous text:

...

2003-01-30 19:08: Subject: Re: hybrid of array & list

Actually no, the multiset iterator currently doesn't keep a stack since it allows the multiset to be changed without loosing track. It'd be possible to store the stack if it locks the index part of the multiset, like the mapping iterator. It wouldn't be difficult to extend it with an option to allow that mode of operation. Maybe it should even be the default mode?

/ Martin Stjernholm, Roxen IS

Martin Stjernholm, Roxen IS ＠ Pike developers forum

7:30 p.m.

New subject: Multiset iterator

I thought it to be more useful to not have any restriction. Then you can e.g. go through a set and add and remove items as you go, and there will be no copy-on-write overhead and you will reach the newly added items if they're inserted ahead of the iterator. To me that's more natural than a frozen index set with copy-on-write, i.e. if the case wasn't mentioned in a manual that's how I'd assume it to be.

Other more unusual operations are made possible too, e.g. to store two iterators (perhaps pointers would be a more appropriate name in this case) that are next to each other, and then at a later point do some operation on the elements that have been inserted in between. The difference from simply looking them up again is that it will be completely well defined even if the indices are identical.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-01-30 19:29: Subject: Re: hybrid of array & list

Hm, *ponders* Doesn't it do rather well without it, though?

I think an iteration should iterate over the dataset as it were at the point of the iteration initialization. If that isn't possible, at least the same index set.

/ Mirar

Mirar ＠ Pike developers forum

7:35 p.m.

New subject: Multiset iterator

But then you get a different behaviour then for a mapping? (Array and string indexes can't change, so they don't matter here.)

Is that behaviour useful in a practical situation? To me, the risk of bugs is obvious, but then again most bugs I can think of would lead to infinite loops and thus detected easily.

( Like, multiset m; ... foreach (m;;int x) m[x+10000]=1; )

/ Mirar

Previous text:

...

2003-01-30 20:27: Subject: Multiset iterator

I thought it to be more useful to not have any restriction. Then you can e.g. go through a set and add and remove items as you go, and there will be no copy-on-write overhead and you will reach the newly added items if they're inserted ahead of the iterator. To me that's more natural than a frozen index set with copy-on-write, i.e. if the case wasn't mentioned in a manual that's how I'd assume it to be.

Other more unusual operations are made possible too, e.g. to store two iterators (perhaps pointers would be a more appropriate name in this case) that are next to each other, and then at a later point do some operation on the elements that have been inserted in between. The difference from simply looking them up again is that it will be completely well defined even if the indices are identical.

/ Martin Stjernholm, Roxen IS

Martin Stjernholm, Roxen IS ＠ Pike developers forum

7:55 p.m.

New subject: Multiset iterator

Being consistent with mappings is the only good reason to do otherwise, afaics. Otoh I don't think it's impossible to make mapping iterators work with changing indices too, but I haven't looked into that in detail.

...

Is that behaviour useful in a practical situation?

If nothing else, it avoids a copy-on-write if you go through a set to remove items. That's something of a bummer with the current mapping iterators, I think.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-01-30 20:33: Subject: Multiset iterator

But then you get a different behaviour then for a mapping? (Array and string indexes can't change, so they don't matter here.)

Is that behaviour useful in a practical situation? To me, the risk of bugs is obvious, but then again most bugs I can think of would lead to infinite loops and thus detected easily.

( Like, multiset m; ... foreach (m;;int x) m[x+10000]=1; )

/ Mirar

Mirar ＠ Pike developers forum

8:10 p.m.

New subject: Multiset iterator

...

That's something of a bummer with the current mapping iterators, I think.

Why? It only happens at the first write, so it doesn't have that big effect, does it?

Anyway, I'd like them to have the same behaviour. And since it's easier to make a controlled copy (isn't it?) of the set you want to index over then continuously looping over inserted items, you're probably correct in that behaviour.

/ Mirar

Previous text:

...

2003-01-30 20:50: Subject: Multiset iterator

Being consistent with mappings is the only good reason to do otherwise, afaics. Otoh I don't think it's impossible to make mapping iterators work with changing indices too, but I haven't looked into that in detail.

...
Is that behaviour useful in a practical situation?

If nothing else, it avoids a copy-on-write if you go through a set to remove items. That's something of a bummer with the current mapping iterators, I think.

/ Martin Stjernholm, Roxen IS

Martin Stjernholm, Roxen IS ＠ Pike developers forum

8:40 p.m.

New subject: Multiset iterator

It typically means a complete mapping copy every time the loop is run. Not something that always can be ignored, I think.

Yes, it'd be easy to do "foreach(m + (<>); int x;) ..." to get a copy-on-write when one want that.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-01-30 21:07: Subject: Multiset iterator

...
That's something of a bummer with the current mapping iterators, I think.

Why? It only happens at the first write, so it doesn't have that big effect, does it?

Anyway, I'd like them to have the same behaviour. And since it's easier to make a controlled copy (isn't it?) of the set you want to index over then continuously looping over inserted items, you're probably correct in that behaviour.

/ Mirar

Mirar ＠ Pike developers forum

8:50 p.m.

New subject: Multiset iterator

No, but consider that both looping over the mapping and copying the mapping are O(n), but running through the loop has much higher constant.

/ Mirar

Previous text:

...

2003-01-30 21:35: Subject: Multiset iterator

It typically means a complete mapping copy every time the loop is run. Not something that always can be ignored, I think.

Yes, it'd be easy to do "foreach(m + (<>); int x;) ..." to get a copy-on-write when one want that.

/ Martin Stjernholm, Roxen IS

Martin Stjernholm, Roxen IS ＠ Pike developers forum

9:10 p.m.

New subject: Multiset iterator

Probably but not necessarily. It's possible that the loop only goes through part of it.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-01-30 21:47: Subject: Multiset iterator

No, but consider that both looping over the mapping and copying the mapping are O(n), but running through the loop has much higher constant.

/ Mirar

Mirar ＠ Pike developers forum

9:20 p.m.

New subject: Multiset iterator

Yes.

What was the old problems with looping over mappings? I forgot. What was the reasons for the current design?

Is it at all possible to let a mapping iterator loop over new elements?

/ Mirar

Previous text:

...

2003-01-30 22:05: Subject: Multiset iterator

Probably but not necessarily. It's possible that the loop only goes through part of it.

/ Martin Stjernholm, Roxen IS

Martin Stjernholm, Roxen IS ＠ Pike developers forum

10 p.m.

New subject: Multiset iterator

I think it's because the behavior would be almost completely undefined if the mapping changes; there might be a rehash and then the iterator is effectively randomized even if it held on to the same keypair.

A way to solve it would be to introduce a "verbatim" mode where the data block never is shrunk and only grown at the end (the multisets have such a mode), and then iterate through the keypair block directly instead of through the hash table. This way one could make new elements always be inserted either behind or ahead of the iterator. The data block would only stay in verbatim mode as long as there are iterators for it.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-01-30 22:16: Subject: Multiset iterator

Yes.

What was the old problems with looping over mappings? I forgot. What was the reasons for the current design?

Is it at all possible to let a mapping iterator loop over new elements?

/ Mirar

Martin Baehr

29 Jan 29 Jan

10:36 a.m.

On Wed, Jan 29, 2003 at 02:10:03AM +0100, Martin Nilsson (Åskblod) @ Pike (-) developers forum wrote:

...

...
a = a[1..]; a += ({ some_value }); This is quite slow, especially for big arrays :)

And mappings would help here, or what is your point? That inappropriate use of a datatype is slower than using the right one (ADT.Queue)?

well, but exactly this information that one should use ADT.Queue for things like these instead of arrays would be another optimization hint.

consider that while array is an abvious and easy to understand datatype to new users, ADT.Queue is much more obscure and will only be found by people who read the docs front to back.

the discussions shows that there is lots of material for optimization in pike, with the difference that the optimizations are not about workarounds to obscure limitations (as in python)

more of that...

greetings, martin.

Mirar ＠ Pike developers forum

8:05 a.m.

...

a = a[1..]; a += ({ some_value });

^^^^^^^^^^

*removing* elements isn't fast. Adding elements are.

But anyway, I wrote a benchmark for it. It uses v+=({17}) and v[x]=y respectively, up to 100000 elements (k times):

test total user mem (runs) Append array............... 0.808s 0.525s 3540kb (19) (952859/s) Append mapping............. 0.703s 0.459s 3784kb (22) (1090188/s) Append multiset............ 1.132s 0.806s 3684kb (14) (620018/s)

I don't know if it's fast, around one microsecond per insertation... I suspect it might be the function call that is slow, not the insertation in itself - recursed loops gives about the same figures:

Loops Recursed............. 1.337s 1.077s 3680kb (12) (973910 iters/s)

/ Mirar

Previous text:

...

2003-01-29 01:31: Subject: Re: reasons why pike is better than python?

On Wed, Jan 29, 2003 at 12:55:03AM +0100, Martin Nilsson (Åskblod) @ Pike (-) developers forum wrote:

...
No, appending elements to arrays are fast in 7.4.

It depends. If it is combined with elemnts removal, epsecially like:

a = a[1..]; a += ({ some_value });

This is quite slow, especially for big arrays :)

...
No, Pike regexps are faster than e.g. python regexps according to the langauge shootout.

As Martin said already - then Python regexps are slow too. And everyone knows alternative which is faster and powerful, but it is still not in Pike :)

PS: 2 Martin: could you please change the subject to something like: "Reasons why apples are better than oranges?" :)) I think that every language has its advantages and disavantages, which are dependent on task, so... :))

Regards, /Al

/ Brevbäraren

Dan Nelson

8:18 a.m.

In the last episode (Jan 29), Mirar @ Pike developers forum said:

...

...
a = a[1..]; a += ({ some_value });
    ^^^^^^^^^^
*removing* elements isn't fast. Adding elements are.

But anyway, I wrote a benchmark for it. It uses v+=({17}) and v[x]=y respectively, up to 100000 elements (k times):

test total user mem (runs) Append array............... 0.808s 0.525s 3540kb (19) (952859/s) Append mapping............. 0.703s 0.459s 3784kb (22) (1090188/s) Append multiset............ 1.132s 0.806s 3684kb (14) (620018/s)

You're not really appending a mapping though; you're inserting a value. What's the benchmark time for doing v+=([ x:y ]) ? It's too bad there isn't an "append array element" syntax that doesn't require you to generate a 1-element array to append.

-- Dan Nelson dnelson@allantgroup.com

Mirar ＠ Pike developers forum

8:50 a.m.

No, it's really "inserting" in mappings and multiset, I guess... *edits the tests*

Gosh, that was *slow*. It seem that even if I have the only other reference (except the expression), it must copy the mapping/multiset:

mapping v=([]); for (int j=0; j<1000; j++) v|=([j:42]);

takes 290ms (repeated 10 times in the test below):

test total user mem (runs) Append array............... 0.760s 0.535s 3524kb (2) (934579/s) Append mapping............. 3.423s 2.910s 3632kb (2) (3436/s) Append multiset............ 0.712s 0.495s 3556kb (2) (20202/s) Insert in mapping.......... 0.701s 0.435s 3800kb (2) (1149425/s) Insert in multiset......... 1.101s 0.820s 3696kb (2) (609756/s)

Note that Append array and the Insert tests enlarges the mapping until it's 100000 elements, while the Append mapping and multiset only manages 1000 elements (already at 10000 it took way too long).

/ Mirar

Previous text:

...

2003-01-29 09:18: Subject: Re: reasons why pike is better than python?

In the last episode (Jan 29), Mirar @ Pike developers forum said:

...
...
a = a[1..]; a += ({ some_value });
    ^^^^^^^^^^
*removing* elements isn't fast. Adding elements are.

But anyway, I wrote a benchmark for it. It uses v+=({17}) and v[x]=y respectively, up to 100000 elements (k times):

test total user mem (runs) Append array............... 0.808s 0.525s 3540kb (19) (952859/s) Append mapping............. 0.703s 0.459s 3784kb (22) (1090188/s) Append multiset............ 1.132s 0.806s 3684kb (14) (620018/s)
You're not really appending a mapping though; you're inserting a value. What's the benchmark time for doing v+=([ x:y ]) ? It's too bad there isn't an "append array element" syntax that doesn't require you to generate a 1-element array to append.

-- Dan Nelson dnelson@allantgroup.com

/ Brevbäraren

Johan Sundstr�m (a hugging punishment!) ＠ Pike (-) developers forum

8:55 a.m.

...

You're not really appending a mapping though; you're inserting a value. What's the benchmark time for doing v+=([ x:y ])?

Seems to be approximately three times slower than v[x] = y. Don't be surprised if that changes due to optimiser additions though - naïve code like this is a typical example of where the optimiser can help turn slow code into quicker code.

/ Johan Sundström (a hugging punishment!)

Previous text:

...

2003-01-29 09:18: Subject: Re: reasons why pike is better than python?

In the last episode (Jan 29), Mirar @ Pike developers forum said:

...
...
a = a[1..]; a += ({ some_value });
    ^^^^^^^^^^
*removing* elements isn't fast. Adding elements are.

But anyway, I wrote a benchmark for it. It uses v+=({17}) and v[x]=y respectively, up to 100000 elements (k times):

test total user mem (runs) Append array............... 0.808s 0.525s 3540kb (19) (952859/s) Append mapping............. 0.703s 0.459s 3784kb (22) (1090188/s) Append multiset............ 1.132s 0.806s 3684kb (14) (620018/s)
You're not really appending a mapping though; you're inserting a value. What's the benchmark time for doing v+=([ x:y ]) ? It's too bad there isn't an "append array element" syntax that doesn't require you to generate a 1-element array to append.

-- Dan Nelson dnelson@allantgroup.com

/ Brevbäraren

Mirar ＠ Pike developers forum

9 a.m.

v+=([x:y]) seems to be n² to me, while v[x]=y is n. That is a huge, huge, difference, in my opinion. :)

/ Mirar

Previous text:

...

2003-01-29 09:52: Subject: Re: reasons why pike is better than python?

...
You're not really appending a mapping though; you're inserting a value. What's the benchmark time for doing v+=([ x:y ])?

Seems to be approximately three times slower than v[x] = y. Don't be surprised if that changes due to optimiser additions though - naïve code like this is a typical example of where the optimiser can help turn slow code into quicker code.

/ Johan Sundström (a hugging punishment!)

Johan Sundstr�m (a hugging punishment!) ＠ Pike (-) developers forum

9:20 a.m.

Yes; somehow I did not consider testing different number of loop iterations for order of-behaviour, when the point I was planning on making was that the entire construct was inappropriate. :-) And not too unfrequent in code written by inexperienced pike programmers.

/ Johan Sundström (a hugging punishment!)

Previous text:

...

2003-01-29 09:59: Subject: Re: reasons why pike is better than python?

v+=([x:y]) seems to be n² to me, while v[x]=y is n. That is a huge, huge, difference, in my opinion. :)

/ Mirar

Mirar ＠ Pike developers forum

9:25 a.m.

I wonder a+=({x}) is n while m+=(<x>) is n²?

It seems that the same kind of copying that would casue n² would also effect arrays with "too many" references. Is the few-reference- no-copying optimization in multiset/mapping operations broken?

/ Mirar

Previous text:

...

2003-01-29 10:18: Subject: Re: reasons why pike is better than python?

Yes; somehow I did not consider testing different number of loop iterations for order of-behaviour, when the point I was planning on making was that the entire construct was inappropriate. :-) And not too unfrequent in code written by inexperienced pike programmers.

/ Johan Sundström (a hugging punishment!)

Martin Baehr

10:29 a.m.

what would an experienced programmer use in such a case then?

Per Hedbor () ＠ Pike (-) developers forum

11:05 a.m.

It is almost always possible to convert m += ([ "foo":bar ]); to m["foo"] = bar;

The exception is when the mapping in m is also stored in another variable, and you really want to have two different mappings after the +=.

/ Per Hedbor ()

Previous text:

...

2003-01-29 11:32: Subject: Re: reasons why pike is better than python?

what would an experienced programmer use in such a case then?

/ Brevbäraren

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

11:30 a.m.

For n insertions, you mean? Sounds about right.

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Previous text:

...

2003-01-29 09:59: Subject: Re: reasons why pike is better than python?

v+=([x:y]) seems to be n² to me, while v[x]=y is n. That is a huge, huge, difference, in my opinion. :)

/ Mirar

Martin Stjernholm, Roxen IS ＠ Pike developers forum

30 Jan 30 Jan

1:05 a.m.

Is that in a 7.5? It'd be interesting to see comparisons between the old and new multiset implementations (i.e. between 7.4 and 7.5, assuming the default configure settings). I haven't got around to make a good performance comparison between them.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-01-29 09:00: Subject: Re: reasons why pike is better than python?

...
a = a[1..]; a += ({ some_value });
   ^^^^^^^^^^
*removing* elements isn't fast. Adding elements are.

But anyway, I wrote a benchmark for it. It uses v+=({17}) and v[x]=y respectively, up to 100000 elements (k times):

test total user mem (runs) Append array............... 0.808s 0.525s 3540kb (19) (952859/s) Append mapping............. 0.703s 0.459s 3784kb (22) (1090188/s) Append multiset............ 1.132s 0.806s 3684kb (14) (620018/s)

I don't know if it's fast, around one microsecond per insertation... I suspect it might be the function call that is slow, not the insertation in itself - recursed loops gives about the same figures:

Loops Recursed............. 1.337s 1.077s 3680kb (12) (973910 iters/s)

/ Mirar

Mirar ＠ Pike developers forum

7:15 a.m.

It shouldn't be a problem porting the benchmarks to 7.4, except that if they run too slow you might have to wait<tm> for the results. :)

/ Mirar

Previous text:

...

2003-01-30 02:01: Subject: Re: reasons why pike is better than python?

Is that in a 7.5? It'd be interesting to see comparisons between the old and new multiset implementations (i.e. between 7.4 and 7.5, assuming the default configure settings). I haven't got around to make a good performance comparison between them.

/ Martin Stjernholm, Roxen IS

David Hedbor ＠ Pike developers forum

29 Jan 29 Jan

7:15 p.m.

PCRE, which is in PEXts (and I wish would really be in Pike :-) is faster than the builtin regexps.

/ David Hedbor

Previous text:

...

2003-01-29 01:31: Subject: Re: reasons why pike is better than python?

On Wed, Jan 29, 2003 at 12:55:03AM +0100, Martin Nilsson (Åskblod) @ Pike (-) developers forum wrote:

...
No, appending elements to arrays are fast in 7.4.

It depends. If it is combined with elemnts removal, epsecially like:

a = a[1..]; a += ({ some_value });

This is quite slow, especially for big arrays :)

...
No, Pike regexps are faster than e.g. python regexps according to the langauge shootout.

As Martin said already - then Python regexps are slow too. And everyone knows alternative which is faster and powerful, but it is still not in Pike :)

PS: 2 Martin: could you please change the subject to something like: "Reasons why apples are better than oranges?" :)) I think that every language has its advantages and disavantages, which are dependent on task, so... :))

Regards, /Al

/ Brevbäraren

Fredrik (Naranek) Hubinette (Real Build Master) ＠ Pike (-) developers forum

21 Mar 21 Mar

7:30 a.m.

...

It depends. If it is combined with elemnts removal, epsecially like:

a = a[1..]; a += ({ some_value });

This is quite slow, especially for big arrays :)

No it's not, proof below: (I optimized the range operator in Pike 7.3.11 or so)

---------------------arraybench.pike------------------------ #!/usr/bin/env pike

int main() { for(array(int) a=({19});sizeof(a) < 10000000; a+=a) { int iter; int t=time(); for(float x=time(t) + 1.0;time(t) < x;iter++) { a+=({17}); a=a[1..]; } write("%8d: %12d (%12d)\n",sizeof(a),iter,iter/sizeof(a)); } } --------------------------------------------------------------- Results: 1: 409885 ( 409885) 2: 400459 ( 200229) 4: 380187 ( 95046) 8: 377355 ( 47169) 16: 335558 ( 20972) 32: 707770 ( 22117) 64: 707568 ( 11055) 128: 724203 ( 5657) 256: 724851 ( 2831) 512: 722408 ( 1410) 1024: 731361 ( 714) 2048: 722853 ( 352) 4096: 724260 ( 176) 8192: 718247 ( 87) 16384: 703705 ( 42) 32768: 685720 ( 20) 65536: 685558 ( 10) 131072: 681559 ( 5) 262144: 684715 ( 2) 524288: 677111 ( 1) 1048576: 663094 ( 0) 2097152: 624890 ( 0) 4194304: 600015 ( 0) 8388608: 461745 ( 0)

/ Fredrik (Naranek) Hubinette (Real Build Master)

Previous text:

...

2003-01-29 01:31: Subject: Re: reasons why pike is better than python?

On Wed, Jan 29, 2003 at 12:55:03AM +0100, Martin Nilsson (Åskblod) @ Pike (-) developers forum wrote:

...
No, appending elements to arrays are fast in 7.4.

It depends. If it is combined with elemnts removal, epsecially like:

a = a[1..]; a += ({ some_value });

This is quite slow, especially for big arrays :)

...
No, Pike regexps are faster than e.g. python regexps according to the langauge shootout.

As Martin said already - then Python regexps are slow too. And everyone knows alternative which is faster and powerful, but it is still not in Pike :)

PS: 2 Martin: could you please change the subject to something like: "Reasons why apples are better than oranges?" :)) I think that every language has its advantages and disavantages, which are dependent on task, so... :))

Regards, /Al

/ Brevbäraren

David Hedbor ＠ Pike developers forum

8:50 a.m.

And for those interested, here's a comparision which clearly shows the difference.

7.4.13: 1: 253364 ( 253364) 2: 248881 ( 124440) 4: 248432 ( 62108) 8: 222656 ( 27832) 16: 219336 ( 13708) 32: 429621 ( 13425) 64: 432151 ( 6752) 128: 433364 ( 3385) 256: 434760 ( 1698) 512: 435864 ( 851) 1024: 434589 ( 424) 2048: 433014 ( 211) 4096: 425080 ( 103) 8192: 414067 ( 50) 16384: 373576 ( 22) 32768: 390452 ( 11) 65536: 382807 ( 5) 131072: 386023 ( 2) 262144: 391858 ( 1) 524288: 380863 ( 0) 1048576: 381443 ( 0) 2097152: 336393 ( 0) 4194304: 245102 ( 0) 8388608: 56765 ( 0)

7.2.364: 1: 244592 ( 244592) 2: 244008 ( 122004) 4: 244752 ( 61188) 8: 222359 ( 27794) 16: 217999 ( 13624) 32: 207494 ( 6484) 64: 176696 ( 2760) 128: 139072 ( 1086) 256: 108573 ( 424) 512: 71375 ( 139) 1024: 43590 ( 42) 2048: 23732 ( 11) 4096: 12354 ( 3) 8192: 4375 ( 0) 16384: 430 ( 0) 32768: 140 ( 0) 65536: 66 ( 0) 131072: 32 ( 0) 262144: 16 ( 0) 524288: 8 ( 0) 1048576: 4 ( 0) 2097152: 2 ( 0) 4194304: 1 ( 0) 8388608: 1 ( 0)

/ David Hedbor

Previous text:

...

2003-03-21 08:26: Subject: Re: reasons why pike is better than python?

...
It depends. If it is combined with elemnts removal, epsecially like:

a = a[1..]; a += ({ some_value });

This is quite slow, especially for big arrays :)

No it's not, proof below: (I optimized the range operator in Pike 7.3.11 or so)

---------------------arraybench.pike------------------------ #!/usr/bin/env pike

int main() { for(array(int) a=({19});sizeof(a) < 10000000; a+=a) { int iter; int t=time(); for(float x=time(t) + 1.0;time(t) < x;iter++) { a+=({17}); a=a[1..]; } write("%8d: %12d (%12d)\n",sizeof(a),iter,iter/sizeof(a)); } }

Results: 1: 409885 ( 409885) 2: 400459 ( 200229) 4: 380187 ( 95046) 8: 377355 ( 47169) 16: 335558 ( 20972) 32: 707770 ( 22117) 64: 707568 ( 11055) 128: 724203 ( 5657) 256: 724851 ( 2831) 512: 722408 ( 1410) 1024: 731361 ( 714) 2048: 722853 ( 352) 4096: 724260 ( 176) 8192: 718247 ( 87) 16384: 703705 ( 42) 32768: 685720 ( 20) 65536: 685558 ( 10) 131072: 681559 ( 5) 262144: 684715 ( 2) 524288: 677111 ( 1) 1048576: 663094 ( 0) 2097152: 624890 ( 0) 4194304: 600015 ( 0) 8388608: 461745 ( 0)

/ Fredrik (Naranek) Hubinette (Real Build Master)

Dan Nelson

29 Jan 29 Jan

12:07 a.m.

In the last episode (Jan 29), Martin Baehr said:

...

On Tue, Jan 28, 2003 at 08:25:03PM +0100, Martin Nilsson (Åskblod) @ Pike (-) developers forum wrote:

...
There are of course dos and donts in Pike as well, but we try to adress them in the optimizer when we find them.

details? what about appending elements to an array vs a mapping? the latter is supposed to be faster (some people claim anyways)

When array_append has to resize an array, it grows it by 50%, so apart from a bit of wasted memory on large arrays I think it's very efficient. It also depends on what your data is and what you're going to do with it. If the index of your mapping is not an integer, you really can't put it in an array anyhow.

-- Dan Nelson dnelson@allantgroup.com

Martin Stjernholm, Roxen IS ＠ Pike developers forum

12:40 a.m.

I'd say that resize_array grows the array with 100%, i.e. it doubles the size. Also note that mappings (and the new multiset implementation) use the same strategy, i.e. they double the size when they grow, and they shrink to 25% when less than that is used.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-01-29 01:07: Subject: Re: reasons why pike is better than python?

In the last episode (Jan 29), Martin Baehr said:

...
On Tue, Jan 28, 2003 at 08:25:03PM +0100, Martin Nilsson (Ãskblod) @ Pike (-) developers forum wrote:

...
There are of course dos and donts in Pike as well, but we try to adress them in the optimizer when we find them.

details? what about appending elements to an array vs a mapping? the latter is supposed to be faster (some people claim anyways)

When array_append has to resize an array, it grows it by 50%, so apart from a bit of wasted memory on large arrays I think it's very efficient. It also depends on what your data is and what you're going to do with it. If the index of your mapping is not an integer, you really can't put it in an array anyhow.

-- Dan Nelson dnelson@allantgroup.com

/ Brevbäraren

Johan Sundstr�m (a hugging punishment!) ＠ Pike (-) developers forum

28 Jan 28 Jan

8:45 p.m.

...

to me this page much more reads like a list of reasons not to use pike.

I assume you really meant python, given the scope of the page? If not, I'd be interested to hear why; it doesn't make sense to me.

/ Johan Sundström (a hugging punishment!)

Previous text:

...

2003-01-28 20:09: Subject: reasons why pike is better than python?

hi,

someone posted this url: http://trific.ath.cx/resources/python/optimization/ in roxenchat with the comment that pike should have similar comments to help people optimize their code.

to me this page much more reads like a list of reasons not to use pike.

could someone please assert that pike doesn't have issues like these?

of course some optimization suggestions pointing out what things are in fact slow would be nice too.

greetings, martin.

/ Brevbäraren

Martin Baehr

11:16 p.m.

On Tue, Jan 28, 2003 at 09:45:05PM +0100, Johan Sundström (a hugging punishment!) @ Pike (-) developers forum wrote:

...

...
to me this page much more reads like a list of reasons not to use pike.

^^^^^ ouch!!! one of thos freudian mistakes, where one says the opposite if what one wants to say.

...

I assume you really meant python, given the scope of the page?

absolutely correct.

greetings, martin.

8156

Age (days ago)

8208

Last active (days ago)

pike-devel@lists.lysator.liu.se

92 comments

16 participants

tags (0)

participants (16)

Alexander Demenshin
Dan Nelson
David Hedbor ＠ Pike developers forum
Fredrik (Naranek) Hubinette (Real Build Master) ＠ Pike (-) developers forum
Johan Sundstr�m (a hugging punishment!) ＠ Pike (-) developers forum
Leif Stensson, Lysator ＠ Pike developers forum
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum
Martin Baehr
Martin Nilsson (�skblod) ＠ Pike (-) developers forum
Martin Stjernholm, Roxen IS ＠ Pike developers forum
Mirar ＠ Pike developers forum
Per Hedbor () ＠ Pike (-) developers forum
Peta, jo det �r jag ＠ Pike developers forum
Peter Bortas ＠ Pike developers forum
Peter Lundqvist (disjunkt) ＠ Pike (-) developers forum
Xavier Beaudouin