URL:http://pike.ida.liu.se/generated/manual/modref/ex/predef_3A_3A/sort.html
Note
The sorting algorithm used is not stable, ie elements that are equal may get reordered.
/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)
Previous text:
yes, but how do you know they are being reordered, if they are equal? and what problems could that create?
greetings, martin.
If the procedure steps must be executed in any particular order, they shouldn't be equal. The whole point in sorting a procedure is of course to get a total ordering from a partial ordering describing the interdependenices between the steps, and if the partial order is incorrect then of course the total ordering will be incorrect as well.
/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)
Previous text:
Think for instance of a user interface with a table, with a few columns of which the user could sort on. A traditional user interface uses stable sorts, so if the the user first sorts on column 1 and then on column 2, the result will be sorted on column 2, but if two elements are equal in that column the previous sort will make those element sort internally on column 1:
1 2 abc hej ghi hej def fii
1< 2
abc hej def fii {sort(column(table,1),table)} ghi hej
1 >2< abc hej ghi hej {sort(column(table,2),table)} def fii
and not
1 >2< ghi hej abc hej def fii
This can of course be passed by by remembering the previous sorts, and make a comparison function that uses those. But a stable sort would sometimes be much nicer.
/ Mirar
Previous text:
Equality is of course quite enough for basic types such as integers; the equality property being as strong as `==. When you are shuffling about an array of objects, reordering equal objects might render odd problems if you need stable sorting.
/ Johan Sundström (folkskådare)
Previous text:
how can two objects be different when they are equal? isn't == then comparing object references? and doesn't the equality of references mean that they are the same object?
greetings, martin.
We can have several different statements that describes the same information: Statement("100:-") Statement("One hundred SEK") Statement("hundra spänn")
It would make sense to set these statements to be equal to each other. However, when we serialize these we don't know in which order they'll appear (if they are stored in an array that is sorted). While it doesn't matter in the application, we would get bigger diffs in the CVS where we store the serialization.
/ Martin Nilsson (har bott i google)
Previous text:
Have you never ever written a program that sorts a list of names? Let's say you have a set of records like this:
({ ({ "Simpson", "Homer" }), ({ "Simpson", "Bart" }), ({ "Flanders", "Ned" }) })
If you want to sort on last name as the primary key and the first name as the secondary you may run two sorting passes where the first one only inspects the first name:
({ ({ "Simpson", "Bart" }), ({ "Simpson", "Homer" }), ({ "Flanders", "Ned" }) })
(since "B" <= "H" <= "N").
Finally, run a second sort pass, this time using the last name as the key. This will only return correct results if the sort algorithm is stable.
/ Jonas Walldén
Previous text:
duh, now i get it. it is not the order of the elements themselves, but the secondary order of dependent elements. thanks.
greetings, martin.
Well, ({ "Simpson", "Homer" }) _is_ one element so it is in fact the order of the elements since you don't want to mix first and last names for different people. On the other hand the sort comparison callback may (as in this case) look at sub-parts of the elements when deciding how to sort the elements.
/ Jonas Walldén
Previous text:
It's very simple: Just remember the original position and sort on that when items are equal otherwise. That requires a little bit more memory so a separate stable version could be considered. But I don't think it's worth the trouble since sort() already temporarily allocates memory linear to the size of the sorted array.
/ Martin Stjernholm, Roxen IS
Previous text:
If I remember correctly, sort does not allocate any memory if you sort an array with only one reference.
Either way, I would be more conserned with how much extra time it takes than how much extra memory it takes.
/Hubbe
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
On Fri, 25 Apr 2003, Fredrik (Naranek) Hubinette (Real Build Master) @ Pike (-) developers forum wrote:
If I remember correctly, sort does not allocate any memory if you sort an array with only one reference.
That was the reason for my question. But quicksort doesn't take extra memory (at least in theory). On the other hand it is not stable, that votes pro quicksort beeing the current algorithm.
Either way, I would be more conserned with how much extra time it takes than how much extra memory it takes.
agreed. Yet there should exist a fast, stable, in situ algorithm suited as a general purpose sorting algorithm for pike. If that is not the case, it could be interesting to provide a second algorithm e.g. _sort that would be optimized for almost sorted arrays and stable.
--- Ludger
Yes, you're almost right and I was mistaken. It only allocates extra memory when sort() is used with more than one argument. Number of references doesn't matter.
I believe the extra time would be small. An initialization pass through the position vector would be required, and swaps would have to swap in that one too. Comparisons in the position vector would only occur for elements that are equal.
/ Martin Stjernholm, Roxen IS
Previous text:
Actually, I don't think you'll need an extra vector, since sort already uses a [0..n[ vector to keep track of the sort order. (When not sorting destructively.)
So basically the overhead would only be: o extra compares when equal o possibly more swaps because of more defined order o destructive sorting when array has one ref is no longer possible (fairly rare anyways)
All that would be required is a minor fix to: array.c:internal_cmpfun and anoter minor one to builtin.c:f_sort
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
Hmm, I dug into the sorting stuff a bit and found that the sort rule for arrays is somewhat odd: It sorts recursively on the first element (only). That's a bit peculiar since it doesn't go on to compare the second if the first is equal etc, so it's still not a full recursive sort. Another thing is that it doesn't have any cyclic check, so the following causes sort() to hang:
array a = ({({0}), 0}); a[0][0] = a; a[1] = a; sort (a);
Questions: Why is sort() semi-recursive in this way? Should the hang above be fixed? (I think so.) If so, how?
Mappings and multisets are sorted on the memory address. If sort() is made stable I think it's better to change it so that they aren't reordered at all. Well, multisets could perhaps be sorted recursively on the first index in the same way as arrays.
/ Martin Stjernholm, Roxen IS
Previous text:
Sorting on *all* elements of the arrays would probably be a lot more useful, and fixing the hang would obviously be a good thing.
Why it works that way? Because I thought it was a good idea at the time.
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
Yes, stable sorting comes practically for free when get_order is used.
There's more overhead in the most common case when it uses sort_array_destructively. If get_alpha_order/order_array is used instead it takes about twice the time. If a position vector is added to sort_array_destructively it gets about 15-20% slower. When objects are sorted this overhead is lost completely compared to calling lfuns.
So what do you all think? Is it worth it to make sort() stable? I think it is since not very much of the time in an average Pike program is spent sorting, so the overall impact would be very low. A third option is to make sort() stable and introduce an unstable_sort() for those occasions where it really matters.
/ Martin Stjernholm, Roxen IS
Previous text:
No. Unless we decide that it was a bug that sort was environment dependent, there needs to be some compatibility fixed. Locale.SWE.sort or something.
Besides, it's not really a great improvement to go from locale dependent to platform dependent (although a bit faster I could guess).
/ Martin Nilsson (har bott i google)
Previous text:
Have you tried this with various "types" of arrays too, say arrays that have very many elements of the same type etc? Personally I don't see why we wouldn't want sort and stable_sort (or some other naming). I have _never_ missed a stable sort and thus don't think 15-20% performance hit is worth it. It gives me a hit for something _I_ don't want.
/ David Hedbor
Previous text:
I did try it with arrays with many elements of the same type. You can see the tests for yourself; they're in the benchmark module. Considering how the compare function treats different types I don't think there's a measurable difference for arrays containing different types. I used integers since they are among the fastest types to compare, which means the 15-20% figure is as bad as possible.
A reason to make the normal sort function stable is that it's less surprising if you aren't aware of the issue, so it can avoid bugs. There have been several occasions when unexpected or annoying behavior has surfaced later on because sort() is unstable.
/ Martin Stjernholm, Roxen IS
Previous text:
pike-devel@lists.lysator.liu.se