I take the presence of this in Collection but not _indices() as a sign that e.g. the Set class would return its contents with this function. That doesn't work very well with the current multiset implementation. If it's possible to do lookup on a value with `[], then it should (by definition) be part of the index set and not the value set. (The same applies to _has_value.)
There is nothing that prevents that _indices() should be added to all types that need it. But I did not made it a demand for being a collection since there is after all types that does not have indices that could be a collection. About sets, I regard the content as both indices and values.
Are elements compared with equal() or `==? To be consistent it should be the first, but there clearly can be situations where one would rather do the latter.
This function should also be optional. It might be possible to define for some unlimited collections, but far from all of them.
Shouldn't it be `==, wouldn't it be hard to handle basic types as elements otherwise?
I assume you mean _get_iterator. The argument must be optional to make the interface compatible with Collection.
No I did mean get_iterator since the _get_iterator() didn't take a position, but making it optional argument instead could be a good solution.
Isn't it more useful if it replaces the value in the second case? Then a Map that doesn't allow multiple indices can simply let `[]= and insert be the same.
Don't think that an insert should do a replace.
How is the range defined? Is it possible to index from the end with negative integers? If so, how will that work together with unlimited Sequences?
If the end could not be found it should probably also throw an error. Another solution is not to allow negative integers.
What about being able to set the order in these?
I had a get_comparator method but realised that it will just give another interface and not be necessary in a minimal base.
That won't work well with how the == operator is used. It can be called between arbitrary things e.g. in the master, inside mappings etc. It must be able to cope with any argument; it's not even limited to objects.
This is actually a generic problem with the current use of `==, `< and `>. One would like to have some sort of domain restriction of them, but there's currently no such thing.
Yes it should be _equal()
With this definition a mapping would be an ordered collection, and also almost every collection implementation I can think of that can be iterated over at all.
Sorry been reading in the definition of 'strictly ordered' from OrderedCollection when I've been reading this.
If there is a well defined order, I don't think the exception "if the there hasn't been any changes in the collection" should be necessary; it's possible to do better than that.
There could be collections where you can change the order of the indices.
o If an `[] operator exists, it's able to check if the given key exists in the collection and, in that case, find the corresponding element. That operation must have a time complexity strictly less than O(the size of the collection).
Question do you think I should keep the '[] at all since a for example a linked list will be O(n) and there will probably be others. After all the only reason there is '[] is to resemble the existing types delete, insert and replace will work just as well.
Finally a few words about unlimited/undefined sizes. They are undoubtedly cause a lot of problems and since one of the goals with the structure is to make it possible to use the collections and its sub-concepts in algorithms and most algorithms are quite useless on unlimited and undefined sizes. Therefore I think it's best to not let collections be of a unlimited/undefined size.
/ Peta, jo det är jag
Previous text:
2002-11-09 02:09: Subject: More about ADTs
int _sizeof() Return the current size of the ADT. Return -1 if the size could not be decided or is truly unlimited.
Is there any way to tell the last two cases apart? If not, do you consider it useless to be able to do that?
array _values() Return an array with the values in the ADT.
I take the presence of this in Collection but not _indices() as a sign that e.g. the Set class would return its contents with this function. That doesn't work very well with the current multiset implementation. If it's possible to do lookup on a value with `[], then it should (by definition) be part of the index set and not the value set. (The same applies to _has_value.)
Also, this function has to be optional, or else it's not possible to have unlimited collections.
int _equal(mixed coll) Decide if the collection coll and this collection contains the same elements, an if it is important, in the same order.
Are elements compared with equal() or `==? To be consistent it should be the first, but there clearly can be situations where one would rather do the latter.
This function should also be optional. It might be possible to define for some unlimited collections, but far from all of them.
iterator last() Return an iterator for this ADT positioned at the last item in the ADT.
Note that this function can't be implemented for unlimited collections, so it must be optional.
2.2. Set
/.../
iterator get_iterator(mixed item) Get an iterator for this ADT positioned at the item or null if the item could not be found
I assume you mean _get_iterator. The argument must be optional to make the interface compatible with Collection.
int insert(mixed item)
/.../
mixed delete(mixed item)
/.../
These should probably be lfuns, since they need to be implemented for the new multisets too. Therefore it would probably be better if they followed the convention with an initial "_" as the other lfuns.
Actually most, if not all, functions in Collection and inheriting classes will likely become lfuns sooner or later so this remark applies in general.
Perhaps some explanation of the lfun design might be in order: They are used by various global functions and builtin language constructs to let objects behave like builtin data types. Wherever they are used, it's only tested whether an identifier with that name exists in the object, and if it does then it's used under the assumption that it actually carries out the intended operation; no check is done that the object inherits some interface that declares the function.
In other words, the existence of the identifier itself is interpreted as if the object adheres to an interface. This use of identifiers is something that makes object orientation in Pike in general radically different from e.g. C++ and Java. The advantage is that it's very convenient to implement only those parts of an interface that are necessary to get something to work; to accomplish the same design in e.g. C++ you'd have to have an abstract class for almost every function and then inherit a bunch of classes for the functions you want to implement. The disadvantage is that the identifier namespace in classes is essentially global in this regard.
I think that the advantage outweights the disadvantage when it comes to core interfaces like collections. The namespace issue is then solved either by the use of the backquote operator syntax, or by the "_" prefix. (The notable exception is "cast", which I consider an oversight. "create" and "destroy" might also be considered exceptions, but they are otoh part of the interface to objects themselves and not how they are used to emulate other data types.)
Note that this argumentation does not apply to iterators since they are explicitly requested and therefore can't be confused with anything else, as opposed to e.g. equal() which can be fed (and must work on) any object.
As for your reasons for not having "_" in front of insert and delete:
- There is currently no lfun insert and delete.
That's only an argument insofar that there's not any existing interface to adhere to (the same applies to _search and _has_value, for that matter). I.e. it's not an argument against using "_", only one that it doesn't have to be used.
- I belive that it should be possible to remove and insert without using and lfun
Of course. All lfuns can be called as ordinary functions too.
- If there will be a insert and delete lfun then it should not be difficult to use the current insert and delete methods.
Well, the inconsistent behavior of delete() in Sequence (see below) currently makes that a bit hard, but I hope you'll change that. I don't see how this is an argument either for or against the underline convention.
My argument for using "_" is that if these functions are made into lfuns (which very likely will happen) then they would collide with existing functions that doesn't adhere to this interface (just do a grep through lib/modules to see a couple of examples of that).
multiset cast("multiset") The set should at least be able to cast to a multiset, /.../
Is there a particular reason for that? That means that an unlimited collection can't be a Set. Besides, I can't see how this operation is a necessary part of something being a set, abstractly speaking. I suggest you make it an optional but recommended operation instead.
2.3. Map A map is a data structure that allows you to associate a key with one or more values /.../
Afaics this means that the property unique vs duplicate indices is connected with the property of whether values exist or not. Why?
mixed '[](mixed key) Return the item associated with the key if it could be found, 0 if it couldn't
0 ought to be UNDEFINED instead (i.e. a 0 for which zero_type returns 1).
mixed '[]=(mixed key, mixed item) If there is an item with the key key then the associated item will be overwritten with the supplied item. If there is no key in the map a new key-item pair will be added.
The return value is item?
iterator get_iterator(mixed key) Get an iterator for this ADT positioned at the key or null if the key could not be found.
This doesn't mention the case when a Map contains duplicate indices. (Also applies to the other functions that uses a key here, for that matter.)
mixed insert(mixed key, mixed item) If the key doesn't exist or duplicates are allowed, associate the value with the key and return 0, otherwise don't modify the map and return the current value associated with the key.
Isn't it more useful if it replaces the value in the second case? Then a Map that doesn't allow multiple indices can simply let `[]= and insert be the same.
mixed delete(mixed key) Remove the pair with the key key. Return the keys value if it was removed, if the key could not be removed return zero_type.
You probably mean UNDEFINED lastly. (Yes, I'm being picky but in my experience interface docs better be correct down to the letter.)
2.4. OrderedCollection An OrderedCollection is a collection that has indices that could be used to find the values and the indices are strictly ordered is some way. /.../ int _has_index(int index)
/.../
array _indices()
/.../
This does not make sense to me. For me, and for the primitive data types in Pike, an index is something that can be looked up (in an operation that is at least faster than O(n), otherwise I'd call it "searched for"). That has nothing to do with order between indices.
2.5. Sequence A sequence is a sub-concept of an Ordered Collection. It is a linear collection that allows index-based access and automatically expands to accommodate new elements. A vector is a good example.
Odd definition. Isn't the key concept that it has a well defined order and that the order is _arbitrary_, i.e. that each element has an order wrt to every other element and that order is specified only by how the sequence is created and not on any sort of comparison function?
mixed '[](int ind) Return the item at index ind in the sequence. Throw an error if ind is out of range.
How is the range defined? Is it possible to index from the end with negative integers? If so, how will that work together with unlimited Sequences?
int _search(mixed item, void|int start) Return the index of the first item in the sequence with the index start or higher that match the item item.
Why is this not already in Collection? Note that a (well defined) order isn't necessary to be able to do this; see how search() works on mappings, for example. (Ideally one would always use an iterator to do a search through the values, but then it could not be used with the current search() function, so I agree that an lfun like this one is necessary.)
[Return] -1 if no object was found.
This doesn't work if this function is extended to Maps (or OrderedMaps). UNDEFINED is a better return value to get a consistent interface with those.
int(0..1) delete(mixed item) Remove the first value that are equal to item from the sequence. Return 1 if a item was removed, otherwise 0;
This is inconsistent with delete() in Set and Map, which takes an index. Now it takes a value all of a sudden. I don't think this operation is necessary at all; then there could just as well be a variant of insert() that searches for a value and inserts another one before/after it.
int delete_at(int ind) Remove the item at the index ind. /.../
This is the one that ought to be named delete(), I think.
Return 0 if the removal failed.
Why would it fail, given that the index is in range?
2.6. OrderedMap
/.../
2.7. OrderedSet
/.../
What about being able to set the order in these?
I miss functions to get an iterator based on inequalities, i.e. functions that returns an iterator pointing to the closest element less/greater (or equal) to a given key.
I also miss _search, but otoh I think that one should be present already in Collection.
3.1. Iterator
/.../
int '==(iterator iter) /.../ Throw an error if the two iterators belong to different collections.
That won't work well with how the == operator is used. It can be called between arbitrary things e.g. in the master, inside mappings etc. It must be able to cope with any argument; it's not even limited to objects.
This is actually a generic problem with the current use of `==, `< and `>. One would like to have some sort of domain restriction of them, but there's currently no such thing.
I miss operations to be able to search in either direction, e.g. a search_eq_forward which steps the iterator forward until a value that is equal with the argument according to `==. A more general variant might be a function that takes a predicate function to do the matching. It could also be useful with iterator variants of map() and filter(), i.e. map_forward, filter_backward etc.
3.2. OrderedIterator
An OrderedIterator is an iterator working on an OrderedCollection. It has two characteristics, first it traverse the collection in the same order each time it is traversed, if the there hasn't been any changes in the collection, beginning with the item with the lowest index and ending with the item with the highest index. Secondly it is possibly to say that an iterator is less then or greater then another iterator.
With this definition a mapping would be an ordered collection, and also almost every collection implementation I can think of that can be iterated over at all.
It's more interesting to differentiate between collections with and without a well defined order. (Actually, most of the time when I've said "ordered" in this conference, I really ment "with well defined order".) I can only see two ways to create a well defined order: Either through keeping the arbitrary order created by the user operations (i.e. like in Sequence), or by having a comparison function.
If there is a well defined order, I don't think the exception "if the there hasn't been any changes in the collection" should be necessary; it's possible to do better than that.
Also note that there are several occasions where an iterator over a collection without a well defined order can have meaningful `< and `> operations, but the exception above then becomes essential. See the discussion starting in 9265430.
int(0..1) '<(OrderedIterator iter) Return 1 if iter is positioned after this iterator, otherwise 0. Throw an error if the two iterators belong to different collections.
This exception handling might be a problem in the same way as in `==, but it's not equally severe here. The best way to solve it would probably be to make a typed exception that the caller can intercept and discard when appropriate.
- Index
/.../
a < b implies !(b < a)
But !(b < a) does not imply a < b, I hope, since that'd rule out partial orders. Another relation I hope doesn't hold is:
!(a < b) && !(b < a) implies a == b
Or alternatively:
a != b implies a < b || b < a
Well, that was quite a lot of comments. Generally speaking, I think you should especially reconsider your designation of the index and value sets; see for instance how has_index() and has_value() operates on arrays and multisets. I think a good definition for them is built around the indexing operation:
o If an `[] operator exists, it's able to check if the given key exists in the collection and, in that case, find the corresponding element. That operation must have a time complexity strictly less than O(the size of the collection).
o The key given to `[] is matched within the set of _indices_ of the elements in the collection.
o The result returned by `[] comes from the set of _values_ of the elements in the collection.
/ Martin Stjernholm, Roxen IS