There is nothing that prevents that _indices() should be added to all types that need it. But I did not made it a demand for being a collection since there is after all types that does not have indices that could be a collection.
You're right, e.g. a queue would probably have no indexing operation and hence no indexes. It should be optional, like _values().
About sets, I regard the content as both indices and values.
I think that implies that the [] operation would return the index as the value if it exists. I.e. foo[0] would return 0, making it necessary to use zero_type. No, please regard the elements as indices and all values as 1. That makes [] and []= work with a minimal amount of special cases (only []= needs one), and besides it's compatible with the multiset data type.
Are elements compared with equal() or `==? To be consistent it should be the first, but there clearly can be situations where one would rather do the latter.
/.../
Shouldn't it be `==, wouldn't it be hard to handle basic types as elements otherwise?
That's not a problem; equal() works like == on integers, floats and strings (which I presume you mean with "basic types"). Or did I misunderstand you?
Isn't it more useful if it replaces the value in the second case? Then a Map that doesn't allow multiple indices can simply let `[]= and insert be the same.
Don't think that an insert should do a replace.
Good point, but then it should perhaps throw an error instead? The insert isn't successful, afterall. I don't think it's much use being error tolerant in the insert operation since it's typically only used in ADTs that actually allows duplicate indices, otherwise []= works just as well. Perhaps the insert function shouldn't be defined at all in ADTs that don't allow duplicate indices.
How is the range defined? Is it possible to index from the end with negative integers? If so, how will that work together with unlimited Sequences?
If the end could not be found it should probably also throw an error. Another solution is not to allow negative integers.
My vote is then to throw an error; negative indices are useful as a convenience, although they do add slightly more work for the ADT implementation.
I had a get_comparator method but realised that it will just give another interface and not be necessary in a minimal base.
Is it really a good goal to be minimal? Then you should remove a lot of other functions too, even _get_iterator since there might be collections that have no use for iterators (e.g. queues are perfectly useful without such a thing).
I think it's much better to define these things centrally and optionally, so that if they exist they have a known name and behavior. Otherwise there's the risk that the function to set the comparison function in one ADT is named e.g. set_less and takes a `<, while it in another is called set_comparator and takes a three-way comparison function returning -1, 0 or 1.
(Besides, global functions for setting and getting the `< function is necessary in the new multisets, so these will probably be candidates for lfuns too.)
That won't work well with how the == operator is used. /.../
Yes it should be _equal()
Excellent solution, actually. Didn't think of that.
If there is a well defined order, I don't think the exception "if the there hasn't been any changes in the collection" should be necessary; it's possible to do better than that.
There could be collections where you can change the order of the indices.
Then the iterator can/should "follow" its index to its new position and then continue stepping from there in a well defined way. The only case where the iterator could reasonably loose track is if the element it's "standing on" is removed. (An ADT could solve this either by refusing the remove, by doing a copy-on-write, or by documenting that the iterator more or less loose track in that case. The new multisets take the last route.)
Question do you think I should keep the '[] at all since a for example a linked list will be O(n) and there will probably be others. After all the only reason there is '[] is to resemble the existing types delete, insert and replace will work just as well.
Any other name for it would be confusing in the cases the operation does exist, since the idiomatic name is `[] and nothing else. With this reasoning, a linked list would simply not have any `[].
But drawing lines to Lisp, that would perhaps be unfortunate if one really wants to use such things in vector-like ways. Maybe it's better to soften that efficiency argument to: "If an `[] exists, it should be the fastest available lookup method." That'd still be enough to have some sort of definition to distinguish between the index and value sets, which is the key point here.
Finally a few words about unlimited/undefined sizes. They are undoubtedly cause a lot of problems and since one of the goals with the structure is to make it possible to use the collections and its sub-concepts in algorithms and most algorithms are quite useless on unlimited and undefined sizes. Therefore I think it's best to not let collections be of a unlimited/undefined size.
Then a superclass to Collection would be necessary, and even then it probably would not work out well if Collection isn't designed with a few thoughts in that direction. Most of these issues are solved simply by making functions optional. (That's not really a cost, since a runtime check that a function exists before calling it is unavoidable anyway in Pike.)
Also, at least collections of unknown size aren't that esoteric, just think of a queue that works between processes as well as threads. Such a thing might use a pipe with some system dependent buffer size. It can also reasonably have other properties like not being able to be iterated over, there's no `[]=, etc.
/ Martin Stjernholm, Roxen IS
Previous text:
2002-11-11 12:39: Subject: More about ADTs
I take the presence of this in Collection but not _indices() as a sign that e.g. the Set class would return its contents with this function. That doesn't work very well with the current multiset implementation. If it's possible to do lookup on a value with `[], then it should (by definition) be part of the index set and not the value set. (The same applies to _has_value.)
There is nothing that prevents that _indices() should be added to all types that need it. But I did not made it a demand for being a collection since there is after all types that does not have indices that could be a collection. About sets, I regard the content as both indices and values.
Are elements compared with equal() or `==? To be consistent it should be the first, but there clearly can be situations where one would rather do the latter.
This function should also be optional. It might be possible to define for some unlimited collections, but far from all of them.
Shouldn't it be `==, wouldn't it be hard to handle basic types as elements otherwise?
I assume you mean _get_iterator. The argument must be optional to make the interface compatible with Collection.
No I did mean get_iterator since the _get_iterator() didn't take a position, but making it optional argument instead could be a good solution.
Isn't it more useful if it replaces the value in the second case? Then a Map that doesn't allow multiple indices can simply let `[]= and insert be the same.
Don't think that an insert should do a replace.
How is the range defined? Is it possible to index from the end with negative integers? If so, how will that work together with unlimited Sequences?
If the end could not be found it should probably also throw an error. Another solution is not to allow negative integers.
What about being able to set the order in these?
I had a get_comparator method but realised that it will just give another interface and not be necessary in a minimal base.
That won't work well with how the == operator is used. It can be called between arbitrary things e.g. in the master, inside mappings etc. It must be able to cope with any argument; it's not even limited to objects.
This is actually a generic problem with the current use of `==, `< and `>. One would like to have some sort of domain restriction of them, but there's currently no such thing.
Yes it should be _equal()
With this definition a mapping would be an ordered collection, and also almost every collection implementation I can think of that can be iterated over at all.
Sorry been reading in the definition of 'strictly ordered' from OrderedCollection when I've been reading this.
If there is a well defined order, I don't think the exception "if the there hasn't been any changes in the collection" should be necessary; it's possible to do better than that.
There could be collections where you can change the order of the indices.
o If an `[] operator exists, it's able to check if the given key exists in the collection and, in that case, find the corresponding element. That operation must have a time complexity strictly less than O(the size of the collection).
Question do you think I should keep the '[] at all since a for example a linked list will be O(n) and there will probably be others. After all the only reason there is '[] is to resemble the existing types delete, insert and replace will work just as well.
Finally a few words about unlimited/undefined sizes. They are undoubtedly cause a lot of problems and since one of the goals with the structure is to make it possible to use the collections and its sub-concepts in algorithms and most algorithms are quite useless on unlimited and undefined sizes. Therefore I think it's best to not let collections be of a unlimited/undefined size.
/ Peta, jo det är jag