URL:http://pike.ida.liu.se/generated/manual/modref/ex/predef_3A_3A/sort.html
Note
The sorting algorithm used is not stable, ie elements that are equal may get reordered.
/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)
Previous text:
2003-04-23 16:09: Subject: sort
Pike's sort is not stable.
/ Johan Sundström (folkskådare)
yes, but how do you know they are being reordered, if they are equal? and what problems could that create?
greetings, martin.
The shutdown procedure of your nuclear powerplant might be executed in an out-of-orderly fashion. All sorts of interesting things might end up happening or not happening. :-)
/ Johan Sundström (folkskådare)
Previous text:
2003-04-23 16:37: Subject: Re: sort
yes, but how do you know they are being reordered, if they are equal? and what problems could that create?
greetings, martin.
/ Brevbäraren
If the procedure steps must be executed in any particular order, they shouldn't be equal. The whole point in sorting a procedure is of course to get a total ordering from a partial ordering describing the interdependenices between the steps, and if the partial order is incorrect then of course the total ordering will be incorrect as well.
/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)
Previous text:
2003-04-23 16:39: Subject: Re: sort
The shutdown procedure of your nuclear powerplant might be executed in an out-of-orderly fashion. All sorts of interesting things might end up happening or not happening. :-)
/ Johan Sundström (folkskådare)
(It was an intentionally horrible example, for lack of a good one.) Applying some odd additional sorting criterion on a list of filenames might be a more likely activity where you would prefer sort stability.
/ Johan Sundström (folkskådare)
Previous text:
2003-04-23 16:41: Subject: Re: sort
If the procedure steps must be executed in any particular order, they shouldn't be equal. The whole point in sorting a procedure is of course to get a total ordering from a partial ordering describing the interdependenices between the steps, and if the partial order is incorrect then of course the total ordering will be incorrect as well.
/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)
Think for instance of a user interface with a table, with a few columns of which the user could sort on. A traditional user interface uses stable sorts, so if the the user first sorts on column 1 and then on column 2, the result will be sorted on column 2, but if two elements are equal in that column the previous sort will make those element sort internally on column 1:
1 2 abc hej ghi hej def fii
1< 2
abc hej def fii {sort(column(table,1),table)} ghi hej
1 >2< abc hej ghi hej {sort(column(table,2),table)} def fii
and not
1 >2< ghi hej abc hej def fii
This can of course be passed by by remembering the previous sorts, and make a comparison function that uses those. But a stable sort would sometimes be much nicer.
/ Mirar
Previous text:
2003-04-23 16:37: Subject: Re: sort
yes, but how do you know they are being reordered, if they are equal? and what problems could that create?
greetings, martin.
/ Brevbäraren
Equality is of course quite enough for basic types such as integers; the equality property being as strong as `==. When you are shuffling about an array of objects, reordering equal objects might render odd problems if you need stable sorting.
/ Johan Sundström (folkskådare)
Previous text:
2003-04-23 16:22: Subject: Re: sort
what effect could the reordering of equal elements have?
greetings, martin.
/ Brevbäraren
how can two objects be different when they are equal? isn't == then comparing object references? and doesn't the equality of references mean that they are the same object?
greetings, martin.
We can have several different statements that describes the same information: Statement("100:-") Statement("One hundred SEK") Statement("hundra spänn")
It would make sense to set these statements to be equal to each other. However, when we serialize these we don't know in which order they'll appear (if they are stored in an array that is sorted). While it doesn't matter in the application, we would get bigger diffs in the CVS where we store the serialization.
/ Martin Nilsson (har bott i google)
Previous text:
2003-04-23 16:48: Subject: Re: sort
how can two objects be different when they are equal? isn't == then comparing object references? and doesn't the equality of references mean that they are the same object?
greetings, martin.
/ Brevbäraren
Have you never ever written a program that sorts a list of names? Let's say you have a set of records like this:
({ ({ "Simpson", "Homer" }), ({ "Simpson", "Bart" }), ({ "Flanders", "Ned" }) })
If you want to sort on last name as the primary key and the first name as the secondary you may run two sorting passes where the first one only inspects the first name:
({ ({ "Simpson", "Bart" }), ({ "Simpson", "Homer" }), ({ "Flanders", "Ned" }) })
(since "B" <= "H" <= "N").
Finally, run a second sort pass, this time using the last name as the key. This will only return correct results if the sort algorithm is stable.
/ Jonas Walldén
Previous text:
2003-04-23 16:48: Subject: Re: sort
how can two objects be different when they are equal? isn't == then comparing object references? and doesn't the equality of references mean that they are the same object?
greetings, martin.
/ Brevbäraren
duh, now i get it. it is not the order of the elements themselves, but the secondary order of dependent elements. thanks.
greetings, martin.
Well, ({ "Simpson", "Homer" }) _is_ one element so it is in fact the order of the elements since you don't want to mix first and last names for different people. On the other hand the sort comparison callback may (as in this case) look at sub-parts of the elements when deciding how to sort the elements.
/ Jonas Walldén
Previous text:
2003-04-23 17:05: Subject: Re: sort
duh, now i get it. it is not the order of the elements themselves, but the secondary order of dependent elements. thanks.
greetings, martin.
/ Brevbäraren
I noticed too (post facto). Array.sort_array wasn't as well documented though, but shares the property.
/ Johan Sundström (folkskådare)
Previous text:
2003-04-23 16:12: Subject: sort
URL:http://pike.ida.liu.se/generated/manual/modref/ex/predef_3A_3A/sort.html
Note
The sorting algorithm used is not stable, ie elements that are equal may get reordered.
/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)
Martin Stjernholm had somewhat interesting ideas regarding making any unstable sort algorithm stable with a more or less cheap wrapper. I'm hoping he might comment on the subject. :)
/ Johan Sundström (folkskådare)
Previous text:
2003-04-23 16:15: Subject: sort
I noticed too (post facto). Array.sort_array wasn't as well documented though, but shares the property.
/ Johan Sundström (folkskådare)
It's very simple: Just remember the original position and sort on that when items are equal otherwise. That requires a little bit more memory so a separate stable version could be considered. But I don't think it's worth the trouble since sort() already temporarily allocates memory linear to the size of the sorted array.
/ Martin Stjernholm, Roxen IS
Previous text:
2003-04-23 16:18: Subject: sort
Martin Stjernholm had somewhat interesting ideas regarding making any unstable sort algorithm stable with a more or less cheap wrapper. I'm hoping he might comment on the subject. :)
/ Johan Sundström (folkskådare)
Is there any situation where you want an optimized case of sorting long lists of many similar elements and doesn't care about being stable?
/ Mirar
Previous text:
2003-04-23 20:32: Subject: sort
It's very simple: Just remember the original position and sort on that when items are equal otherwise. That requires a little bit more memory so a separate stable version could be considered. But I don't think it's worth the trouble since sort() already temporarily allocates memory linear to the size of the sorted array.
/ Martin Stjernholm, Roxen IS
On Wed, 23 Apr 2003, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
It's very simple: Just remember the original position and sort on that when items are equal otherwise. That requires a little bit more memory so a separate stable version could be considered. But I don't think it's worth the trouble since sort() already temporarily allocates memory linear to the size of the sorted array.
which algorithm are you actually using?
--- Ludger
/ Martin Stjernholm, Roxen IS
Previous text:
2003-04-23 16:18: Subject: sort
Martin Stjernholm had somewhat interesting ideas regarding making any unstable sort algorithm stable with a more or less cheap wrapper. I'm hoping he might comment on the subject. :)
/ Johan Sundström (folkskådare)
I think it's quicksort, but knowing Hubbe there are probably a couple of extra quirks thrown into it. ;)
/ Martin Stjernholm, Roxen IS
Previous text:
2003-04-24 09:30: Subject: Re: sort
On Wed, 23 Apr 2003, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
It's very simple: Just remember the original position and sort on that when items are equal otherwise. That requires a little bit more memory so a separate stable version could be considered. But I don't think it's worth the trouble since sort() already temporarily allocates memory linear to the size of the sorted array.
which algorithm are you actually using?
--- Ludger
/ Martin Stjernholm, Roxen IS
Previous text:
2003-04-23 16:18: Subject: sort
Martin Stjernholm had somewhat interesting ideas regarding making any unstable sort algorithm stable with a more or less cheap wrapper. I'm hoping he might comment on the subject. :)
/ Johan Sundström (folkskådare)
/ Brevbäraren
It was quicksort until it was realized some years back that it had AWFUL performance if many entries were the same. :)
/ David Hedbor
Previous text:
2003-04-24 19:06: Subject: Re: sort
I think it's quicksort, but knowing Hubbe there are probably a couple of extra quirks thrown into it. ;)
/ Martin Stjernholm, Roxen IS
If I remember correctly, sort does not allocate any memory if you sort an array with only one reference.
Either way, I would be more conserned with how much extra time it takes than how much extra memory it takes.
/Hubbe
On Wed, 23 Apr 2003, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
It's very simple: Just remember the original position and sort on that when items are equal otherwise. That requires a little bit more memory so a separate stable version could be considered. But I don't think it's worth the trouble since sort() already temporarily allocates memory linear to the size of the sorted array.
which algorithm are you actually using?
--- Ludger
/ Martin Stjernholm, Roxen IS
Previous text:
2003-04-23 16:18: Subject: sort
Martin Stjernholm had somewhat interesting ideas regarding making any unstable sort algorithm stable with a more or less cheap wrapper. I'm hoping he might comment on the subject. :)
/ Johan Sundström (folkskådare)
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2003-04-24 09:30: Subject: Re: sort
On Wed, 23 Apr 2003, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
It's very simple: Just remember the original position and sort on that when items are equal otherwise. That requires a little bit more memory so a separate stable version could be considered. But I don't think it's worth the trouble since sort() already temporarily allocates memory linear to the size of the sorted array.
which algorithm are you actually using?
--- Ludger
/ Martin Stjernholm, Roxen IS
Previous text:
2003-04-23 16:18: Subject: sort
Martin Stjernholm had somewhat interesting ideas regarding making any unstable sort algorithm stable with a more or less cheap wrapper. I'm hoping he might comment on the subject. :)
/ Johan Sundström (folkskådare)
/ Brevbäraren
On Fri, 25 Apr 2003, Fredrik (Naranek) Hubinette (Real Build Master) @ Pike (-) developers forum wrote:
If I remember correctly, sort does not allocate any memory if you sort an array with only one reference.
That was the reason for my question. But quicksort doesn't take extra memory (at least in theory). On the other hand it is not stable, that votes pro quicksort beeing the current algorithm.
Either way, I would be more conserned with how much extra time it takes than how much extra memory it takes.
agreed. Yet there should exist a fast, stable, in situ algorithm suited as a general purpose sorting algorithm for pike. If that is not the case, it could be interesting to provide a second algorithm e.g. _sort that would be optimized for almost sorted arrays and stable.
--- Ludger
/Hubbe
On Wed, 23 Apr 2003, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
It's very simple: Just remember the original position and sort on that when items are equal otherwise. That requires a little bit more memory so a separate stable version could be considered. But I don't think it's worth the trouble since sort() already temporarily allocates memory linear to the size of the sorted array.
which algorithm are you actually using?
--- Ludger
/ Martin Stjernholm, Roxen IS
Previous text:
2003-04-23 16:18: Subject: sort
Martin Stjernholm had somewhat interesting ideas regarding making any unstable sort algorithm stable with a more or less cheap wrapper. I'm hoping he might comment on the subject. :)
/ Johan Sundström (folkskådare)
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2003-04-24 09:30: Subject: Re: sort
On Wed, 23 Apr 2003, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
It's very simple: Just remember the original position and sort on that when items are equal otherwise. That requires a little bit more memory so a separate stable version could be considered. But I don't think it's worth the trouble since sort() already temporarily allocates memory linear to the size of the sorted array.
which algorithm are you actually using?
--- Ludger
/ Martin Stjernholm, Roxen IS
Previous text:
2003-04-23 16:18: Subject: sort
Martin Stjernholm had somewhat interesting ideas regarding making any unstable sort algorithm stable with a more or less cheap wrapper. I'm hoping he might comment on the subject. :)
/ Johan Sundström (folkskådare)
/ Brevbäraren
Yes, you're almost right and I was mistaken. It only allocates extra memory when sort() is used with more than one argument. Number of references doesn't matter.
I believe the extra time would be small. An initialization pass through the position vector would be required, and swaps would have to swap in that one too. Comparisons in the position vector would only occur for elements that are equal.
/ Martin Stjernholm, Roxen IS
Previous text:
2003-04-25 03:07: Subject: Re: sort
If I remember correctly, sort does not allocate any memory if you sort an array with only one reference.
Either way, I would be more conserned with how much extra time it takes than how much extra memory it takes.
/Hubbe
On Wed, 23 Apr 2003, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
It's very simple: Just remember the original position and sort on that when items are equal otherwise. That requires a little bit more memory so a separate stable version could be considered. But I don't think it's worth the trouble since sort() already temporarily allocates memory linear to the size of the sorted array.
which algorithm are you actually using?
--- Ludger
/ Martin Stjernholm, Roxen IS
Previous text:
2003-04-23 16:18: Subject: sort
Martin Stjernholm had somewhat interesting ideas regarding making any unstable sort algorithm stable with a more or less cheap wrapper. I'm hoping he might comment on the subject. :)
/ Johan Sundström (folkskådare)
/ Fredrik (Naranek) Hubinette (Real Build Master)
Actually, I don't think you'll need an extra vector, since sort already uses a [0..n[ vector to keep track of the sort order. (When not sorting destructively.)
So basically the overhead would only be: o extra compares when equal o possibly more swaps because of more defined order o destructive sorting when array has one ref is no longer possible (fairly rare anyways)
All that would be required is a minor fix to: array.c:internal_cmpfun and anoter minor one to builtin.c:f_sort
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2003-04-25 16:06: Subject: Re: sort
Yes, you're almost right and I was mistaken. It only allocates extra memory when sort() is used with more than one argument. Number of references doesn't matter.
I believe the extra time would be small. An initialization pass through the position vector would be required, and swaps would have to swap in that one too. Comparisons in the position vector would only occur for elements that are equal.
/ Martin Stjernholm, Roxen IS
Hmm, I dug into the sorting stuff a bit and found that the sort rule for arrays is somewhat odd: It sorts recursively on the first element (only). That's a bit peculiar since it doesn't go on to compare the second if the first is equal etc, so it's still not a full recursive sort. Another thing is that it doesn't have any cyclic check, so the following causes sort() to hang:
array a = ({({0}), 0}); a[0][0] = a; a[1] = a; sort (a);
Questions: Why is sort() semi-recursive in this way? Should the hang above be fixed? (I think so.) If so, how?
Mappings and multisets are sorted on the memory address. If sort() is made stable I think it's better to change it so that they aren't reordered at all. Well, multisets could perhaps be sorted recursively on the first index in the same way as arrays.
/ Martin Stjernholm, Roxen IS
Previous text:
2003-04-25 17:40: Subject: Re: sort
Actually, I don't think you'll need an extra vector, since sort already uses a [0..n[ vector to keep track of the sort order. (When not sorting destructively.)
So basically the overhead would only be: o extra compares when equal o possibly more swaps because of more defined order o destructive sorting when array has one ref is no longer possible (fairly rare anyways)
All that would be required is a minor fix to: array.c:internal_cmpfun and anoter minor one to builtin.c:f_sort
/ Fredrik (Naranek) Hubinette (Real Build Master)
Sorting on *all* elements of the arrays would probably be a lot more useful, and fixing the hang would obviously be a good thing.
Why it works that way? Because I thought it was a good idea at the time.
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2003-04-26 02:12: Subject: Re: sort
Hmm, I dug into the sorting stuff a bit and found that the sort rule for arrays is somewhat odd: It sorts recursively on the first element (only). That's a bit peculiar since it doesn't go on to compare the second if the first is equal etc, so it's still not a full recursive sort. Another thing is that it doesn't have any cyclic check, so the following causes sort() to hang:
array a = ({({0}), 0}); a[0][0] = a; a[1] = a; sort (a);
Questions: Why is sort() semi-recursive in this way? Should the hang above be fixed? (I think so.) If so, how?
Mappings and multisets are sorted on the memory address. If sort() is made stable I think it's better to change it so that they aren't reordered at all. Well, multisets could perhaps be sorted recursively on the first index in the same way as arrays.
/ Martin Stjernholm, Roxen IS
Cyclic arrays cause Array.flatten() to hang as well. It's an intricate problem, if addressed.
/ Johan Sundström (folkskådare)
Previous text:
2003-04-26 02:12: Subject: Re: sort
Hmm, I dug into the sorting stuff a bit and found that the sort rule for arrays is somewhat odd: It sorts recursively on the first element (only). That's a bit peculiar since it doesn't go on to compare the second if the first is equal etc, so it's still not a full recursive sort. Another thing is that it doesn't have any cyclic check, so the following causes sort() to hang:
array a = ({({0}), 0}); a[0][0] = a; a[1] = a; sort (a);
Questions: Why is sort() semi-recursive in this way? Should the hang above be fixed? (I think so.) If so, how?
Mappings and multisets are sorted on the memory address. If sort() is made stable I think it's better to change it so that they aren't reordered at all. Well, multisets could perhaps be sorted recursively on the first index in the same way as arrays.
/ Martin Stjernholm, Roxen IS
Yes, stable sorting comes practically for free when get_order is used.
There's more overhead in the most common case when it uses sort_array_destructively. If get_alpha_order/order_array is used instead it takes about twice the time. If a position vector is added to sort_array_destructively it gets about 15-20% slower. When objects are sorted this overhead is lost completely compared to calling lfuns.
So what do you all think? Is it worth it to make sort() stable? I think it is since not very much of the time in an average Pike program is spent sorting, so the overall impact would be very low. A third option is to make sort() stable and introduce an unstable_sort() for those occasions where it really matters.
/ Martin Stjernholm, Roxen IS
Previous text:
2003-04-25 17:40: Subject: Re: sort
Actually, I don't think you'll need an extra vector, since sort already uses a [0..n[ vector to keep track of the sort order. (When not sorting destructively.)
So basically the overhead would only be: o extra compares when equal o possibly more swaps because of more defined order o destructive sorting when array has one ref is no longer possible (fairly rare anyways)
All that would be required is a minor fix to: array.c:internal_cmpfun and anoter minor one to builtin.c:f_sort
/ Fredrik (Naranek) Hubinette (Real Build Master)
And how about making it locale-independent at the same time?
/ Jonas Walldén
Previous text:
2003-04-26 15:48: Subject: Re: sort
Yes, stable sorting comes practically for free when get_order is used.
There's more overhead in the most common case when it uses sort_array_destructively. If get_alpha_order/order_array is used instead it takes about twice the time. If a position vector is added to sort_array_destructively it gets about 15-20% slower. When objects are sorted this overhead is lost completely compared to calling lfuns.
So what do you all think? Is it worth it to make sort() stable? I think it is since not very much of the time in an average Pike program is spent sorting, so the overall impact would be very low. A third option is to make sort() stable and introduce an unstable_sort() for those occasions where it really matters.
/ Martin Stjernholm, Roxen IS
Starting in which version?
megalon:jonasw $ LC_COLLATE=sv_SE ./pike-sås/7.2/bin/pike Pike v7.2 release 364 running Hilfe v2.0 (Incremental Pike Frontend)
sort( ({ "å", "ä", "ö" }) );
Result: ({ /* 3 elements */ "\345", "\344", "\366" }) Terminal closed. megalon:jonasw $ LC_COLLATE= ./pike-sås/7.2/bin/pike Pike v7.2 release 364 running Hilfe v2.0 (Incremental Pike Frontend)
sort( ({ "å", "ä", "ö" }) );
Result: ({ /* 3 elements */ "\344", "\345", "\366" })
/ Jonas Walldén
Previous text:
2003-04-26 16:02: Subject: Re: sort
It is. It is not platformindependent though.
/ Martin Nilsson (har bott i google)
Pike 7.5
/ Martin Nilsson (har bott i google)
Previous text:
2003-04-26 16:06: Subject: Re: sort
Starting in which version?
megalon:jonasw $ LC_COLLATE=sv_SE ./pike-sås/7.2/bin/pike Pike v7.2 release 364 running Hilfe v2.0 (Incremental Pike Frontend)
sort( ({ "å", "ä", "ö" }) );
Result: ({ /* 3 elements */ "\345", "\344", "\366" }) Terminal closed. megalon:jonasw $ LC_COLLATE= ./pike-sås/7.2/bin/pike Pike v7.2 release 364 running Hilfe v2.0 (Incremental Pike Frontend)
sort( ({ "å", "ä", "ö" }) );
Result: ({ /* 3 elements */ "\344", "\345", "\366" })
/ Jonas Walldén
No. Unless we decide that it was a bug that sort was environment dependent, there needs to be some compatibility fixed. Locale.SWE.sort or something.
Besides, it's not really a great improvement to go from locale dependent to platform dependent (although a bit faster I could guess).
/ Martin Nilsson (har bott i google)
Previous text:
2003-04-26 16:09: Subject: Re: sort
Ok, any plans for back-porting this to stable releases?
/ Jonas Walldén
But the latter is just a bug that's fairly easy to fix.
/ Martin Stjernholm, Roxen IS
Previous text:
2003-04-26 16:15: Subject: Re: sort
No. Unless we decide that it was a bug that sort was environment dependent, there needs to be some compatibility fixed. Locale.SWE.sort or something.
Besides, it's not really a great improvement to go from locale dependent to platform dependent (although a bit faster I could guess).
/ Martin Nilsson (har bott i google)
I took a look in the configure script and port.c and there seems to be adequate measures to ensure that MEMCMP compares unsigned. On which platforms doesn't it work?
/ Martin Stjernholm, Roxen IS
Previous text:
2003-04-26 16:02: Subject: Re: sort
It is. It is not platformindependent though.
/ Martin Nilsson (har bott i google)
Add a sort test to the testsuite and see...
/ Martin Nilsson (har bott i google)
Previous text:
2003-04-28 20:40: Subject: Re: sort
I took a look in the configure script and port.c and there seems to be adequate measures to ensure that MEMCMP compares unsigned. On which platforms doesn't it work?
/ Martin Stjernholm, Roxen IS
I find both suggestions good; either would be an improvement.
/ Johan Sundström (folkskådare)
Previous text:
2003-04-26 15:48: Subject: Re: sort
Yes, stable sorting comes practically for free when get_order is used.
There's more overhead in the most common case when it uses sort_array_destructively. If get_alpha_order/order_array is used instead it takes about twice the time. If a position vector is added to sort_array_destructively it gets about 15-20% slower. When objects are sorted this overhead is lost completely compared to calling lfuns.
So what do you all think? Is it worth it to make sort() stable? I think it is since not very much of the time in an average Pike program is spent sorting, so the overall impact would be very low. A third option is to make sort() stable and introduce an unstable_sort() for those occasions where it really matters.
/ Martin Stjernholm, Roxen IS
Have you tried this with various "types" of arrays too, say arrays that have very many elements of the same type etc? Personally I don't see why we wouldn't want sort and stable_sort (or some other naming). I have _never_ missed a stable sort and thus don't think 15-20% performance hit is worth it. It gives me a hit for something _I_ don't want.
/ David Hedbor
Previous text:
2003-04-26 15:48: Subject: Re: sort
Yes, stable sorting comes practically for free when get_order is used.
There's more overhead in the most common case when it uses sort_array_destructively. If get_alpha_order/order_array is used instead it takes about twice the time. If a position vector is added to sort_array_destructively it gets about 15-20% slower. When objects are sorted this overhead is lost completely compared to calling lfuns.
So what do you all think? Is it worth it to make sort() stable? I think it is since not very much of the time in an average Pike program is spent sorting, so the overall impact would be very low. A third option is to make sort() stable and introduce an unstable_sort() for those occasions where it really matters.
/ Martin Stjernholm, Roxen IS
I did try it with arrays with many elements of the same type. You can see the tests for yourself; they're in the benchmark module. Considering how the compare function treats different types I don't think there's a measurable difference for arrays containing different types. I used integers since they are among the fastest types to compare, which means the 15-20% figure is as bad as possible.
A reason to make the normal sort function stable is that it's less surprising if you aren't aware of the issue, so it can avoid bugs. There have been several occasions when unexpected or annoying behavior has surfaced later on because sort() is unstable.
/ Martin Stjernholm, Roxen IS
Previous text:
2003-04-26 21:58: Subject: Re: sort
Have you tried this with various "types" of arrays too, say arrays that have very many elements of the same type etc? Personally I don't see why we wouldn't want sort and stable_sort (or some other naming). I have _never_ missed a stable sort and thus don't think 15-20% performance hit is worth it. It gives me a hit for something _I_ don't want.
/ David Hedbor
Possible optimization: Check the type field of the array and sort destructively if it doesn't contain any container types.
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2003-04-26 15:48: Subject: Re: sort
Yes, stable sorting comes practically for free when get_order is used.
There's more overhead in the most common case when it uses sort_array_destructively. If get_alpha_order/order_array is used instead it takes about twice the time. If a position vector is added to sort_array_destructively it gets about 15-20% slower. When objects are sorted this overhead is lost completely compared to calling lfuns.
So what do you all think? Is it worth it to make sort() stable? I think it is since not very much of the time in an average Pike program is spent sorting, so the overall impact would be very low. A third option is to make sort() stable and introduce an unstable_sort() for those occasions where it really matters.
/ Martin Stjernholm, Roxen IS
Good suggestion. Then it won't be slower in most common cases, providing the type field is accurate. I wonder how often it is, though. I found that allocate(X, Y) didn't set it, for instance.
/ Martin Stjernholm, Roxen IS
Previous text:
2003-04-27 08:14: Subject: Re: sort
Possible optimization: Check the type field of the array and sort destructively if it doesn't contain any container types.
/ Fredrik (Naranek) Hubinette (Real Build Master)
It's fairly random weather the type field is accurate or not. Improving the type field accuracy is always a good thing as it gives many minor speed improvements, so please go ahead and fix allocate()
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2003-04-27 13:56: Subject: Re: sort
Good suggestion. Then it won't be slower in most common cases, providing the type field is accurate. I wonder how often it is, though. I found that allocate(X, Y) didn't set it, for instance.
/ Martin Stjernholm, Roxen IS
And that he did, with added bonus. Great work!
/ Martin Nilsson (har bott i google)
Previous text:
2003-04-28 06:17: Subject: Re: sort
It's fairly random weather the type field is accurate or not. Improving the type field accuracy is always a good thing as it gives many minor speed improvements, so please go ahead and fix allocate()
/ Fredrik (Naranek) Hubinette (Real Build Master)
Yes, I went through all "allocate_array" in the core and in most modules so now it should be accurate most of the time. There are still places in src/modules/Perl/permod.c and src/modules/system/nt.c that I haven't bothered with.
/ Martin Stjernholm, Roxen IS
Previous text:
2003-04-28 12:17: Subject: Re: sort
And that he did, with added bonus. Great work!
/ Martin Nilsson (har bott i google)
I assume you take all the places that modify ITEM(array) directly into account? If the type field is missing types, it leads to some very strange errors...
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2003-04-28 19:43: Subject: Re: sort
Yes, I went through all "allocate_array" in the core and in most modules so now it should be accurate most of the time. There are still places in src/modules/Perl/permod.c and src/modules/system/nt.c that I haven't bothered with.
/ Martin Stjernholm, Roxen IS
I'm aware of that. If I've missed such places they do it in "publicly stored" arrays which wouldn't safe anyway (or at least very precarious).
/ Martin Stjernholm, Roxen IS
Previous text:
2003-04-28 23:22: Subject: Re: sort
I assume you take all the places that modify ITEM(array) directly into account? If the type field is missing types, it leads to some very strange errors...
/ Fredrik (Naranek) Hubinette (Real Build Master)
pike-devel@lists.lysator.liu.se