int _sizeof() Return the current size of the ADT. Return -1 if the size could not be decided or is truly unlimited.
Is there any way to tell the last two cases apart? If not, do you consider it useless to be able to do that?
array _values() Return an array with the values in the ADT.
I take the presence of this in Collection but not _indices() as a sign that e.g. the Set class would return its contents with this function. That doesn't work very well with the current multiset implementation. If it's possible to do lookup on a value with `[], then it should (by definition) be part of the index set and not the value set. (The same applies to _has_value.)
Also, this function has to be optional, or else it's not possible to have unlimited collections.
int _equal(mixed coll) Decide if the collection coll and this collection contains the same elements, an if it is important, in the same order.
Are elements compared with equal() or `==? To be consistent it should be the first, but there clearly can be situations where one would rather do the latter.
This function should also be optional. It might be possible to define for some unlimited collections, but far from all of them.
iterator last() Return an iterator for this ADT positioned at the last item in the ADT.
Note that this function can't be implemented for unlimited collections, so it must be optional.
2.2. Set
/.../
iterator get_iterator(mixed item) Get an iterator for this ADT positioned at the item or null if the item could not be found
I assume you mean _get_iterator. The argument must be optional to make the interface compatible with Collection.
int insert(mixed item)
/.../
mixed delete(mixed item)
/.../
These should probably be lfuns, since they need to be implemented for the new multisets too. Therefore it would probably be better if they followed the convention with an initial "_" as the other lfuns.
Actually most, if not all, functions in Collection and inheriting classes will likely become lfuns sooner or later so this remark applies in general.
Perhaps some explanation of the lfun design might be in order: They are used by various global functions and builtin language constructs to let objects behave like builtin data types. Wherever they are used, it's only tested whether an identifier with that name exists in the object, and if it does then it's used under the assumption that it actually carries out the intended operation; no check is done that the object inherits some interface that declares the function.
In other words, the existence of the identifier itself is interpreted as if the object adheres to an interface. This use of identifiers is something that makes object orientation in Pike in general radically different from e.g. C++ and Java. The advantage is that it's very convenient to implement only those parts of an interface that are necessary to get something to work; to accomplish the same design in e.g. C++ you'd have to have an abstract class for almost every function and then inherit a bunch of classes for the functions you want to implement. The disadvantage is that the identifier namespace in classes is essentially global in this regard.
I think that the advantage outweights the disadvantage when it comes to core interfaces like collections. The namespace issue is then solved either by the use of the backquote operator syntax, or by the "_" prefix. (The notable exception is "cast", which I consider an oversight. "create" and "destroy" might also be considered exceptions, but they are otoh part of the interface to objects themselves and not how they are used to emulate other data types.)
Note that this argumentation does not apply to iterators since they are explicitly requested and therefore can't be confused with anything else, as opposed to e.g. equal() which can be fed (and must work on) any object.
As for your reasons for not having "_" in front of insert and delete:
- There is currently no lfun insert and delete.
That's only an argument insofar that there's not any existing interface to adhere to (the same applies to _search and _has_value, for that matter). I.e. it's not an argument against using "_", only one that it doesn't have to be used.
- I belive that it should be possible to remove and insert without using and lfun
Of course. All lfuns can be called as ordinary functions too.
- If there will be a insert and delete lfun then it should not be difficult to use the current insert and delete methods.
Well, the inconsistent behavior of delete() in Sequence (see below) currently makes that a bit hard, but I hope you'll change that. I don't see how this is an argument either for or against the underline convention.
My argument for using "_" is that if these functions are made into lfuns (which very likely will happen) then they would collide with existing functions that doesn't adhere to this interface (just do a grep through lib/modules to see a couple of examples of that).
multiset cast("multiset") The set should at least be able to cast to a multiset, /.../
Is there a particular reason for that? That means that an unlimited collection can't be a Set. Besides, I can't see how this operation is a necessary part of something being a set, abstractly speaking. I suggest you make it an optional but recommended operation instead.
2.3. Map A map is a data structure that allows you to associate a key with one or more values /.../
Afaics this means that the property unique vs duplicate indices is connected with the property of whether values exist or not. Why?
mixed '[](mixed key) Return the item associated with the key if it could be found, 0 if it couldn't
0 ought to be UNDEFINED instead (i.e. a 0 for which zero_type returns 1).
mixed '[]=(mixed key, mixed item) If there is an item with the key key then the associated item will be overwritten with the supplied item. If there is no key in the map a new key-item pair will be added.
The return value is item?
iterator get_iterator(mixed key) Get an iterator for this ADT positioned at the key or null if the key could not be found.
This doesn't mention the case when a Map contains duplicate indices. (Also applies to the other functions that uses a key here, for that matter.)
mixed insert(mixed key, mixed item) If the key doesn't exist or duplicates are allowed, associate the value with the key and return 0, otherwise don't modify the map and return the current value associated with the key.
Isn't it more useful if it replaces the value in the second case? Then a Map that doesn't allow multiple indices can simply let `[]= and insert be the same.
mixed delete(mixed key) Remove the pair with the key key. Return the keys value if it was removed, if the key could not be removed return zero_type.
You probably mean UNDEFINED lastly. (Yes, I'm being picky but in my experience interface docs better be correct down to the letter.)
2.4. OrderedCollection An OrderedCollection is a collection that has indices that could be used to find the values and the indices are strictly ordered is some way. /.../ int _has_index(int index)
/.../
array _indices()
/.../
This does not make sense to me. For me, and for the primitive data types in Pike, an index is something that can be looked up (in an operation that is at least faster than O(n), otherwise I'd call it "searched for"). That has nothing to do with order between indices.
2.5. Sequence A sequence is a sub-concept of an Ordered Collection. It is a linear collection that allows index-based access and automatically expands to accommodate new elements. A vector is a good example.
Odd definition. Isn't the key concept that it has a well defined order and that the order is _arbitrary_, i.e. that each element has an order wrt to every other element and that order is specified only by how the sequence is created and not on any sort of comparison function?
mixed '[](int ind) Return the item at index ind in the sequence. Throw an error if ind is out of range.
How is the range defined? Is it possible to index from the end with negative integers? If so, how will that work together with unlimited Sequences?
int _search(mixed item, void|int start) Return the index of the first item in the sequence with the index start or higher that match the item item.
Why is this not already in Collection? Note that a (well defined) order isn't necessary to be able to do this; see how search() works on mappings, for example. (Ideally one would always use an iterator to do a search through the values, but then it could not be used with the current search() function, so I agree that an lfun like this one is necessary.)
[Return] -1 if no object was found.
This doesn't work if this function is extended to Maps (or OrderedMaps). UNDEFINED is a better return value to get a consistent interface with those.
int(0..1) delete(mixed item) Remove the first value that are equal to item from the sequence. Return 1 if a item was removed, otherwise 0;
This is inconsistent with delete() in Set and Map, which takes an index. Now it takes a value all of a sudden. I don't think this operation is necessary at all; then there could just as well be a variant of insert() that searches for a value and inserts another one before/after it.
int delete_at(int ind) Remove the item at the index ind. /.../
This is the one that ought to be named delete(), I think.
Return 0 if the removal failed.
Why would it fail, given that the index is in range?
2.6. OrderedMap
/.../
2.7. OrderedSet
/.../
What about being able to set the order in these?
I miss functions to get an iterator based on inequalities, i.e. functions that returns an iterator pointing to the closest element less/greater (or equal) to a given key.
I also miss _search, but otoh I think that one should be present already in Collection.
3.1. Iterator
/.../
int '==(iterator iter) /.../ Throw an error if the two iterators belong to different collections.
That won't work well with how the == operator is used. It can be called between arbitrary things e.g. in the master, inside mappings etc. It must be able to cope with any argument; it's not even limited to objects.
This is actually a generic problem with the current use of `==, `< and `>. One would like to have some sort of domain restriction of them, but there's currently no such thing.
I miss operations to be able to search in either direction, e.g. a search_eq_forward which steps the iterator forward until a value that is equal with the argument according to `==. A more general variant might be a function that takes a predicate function to do the matching. It could also be useful with iterator variants of map() and filter(), i.e. map_forward, filter_backward etc.
3.2. OrderedIterator
An OrderedIterator is an iterator working on an OrderedCollection. It has two characteristics, first it traverse the collection in the same order each time it is traversed, if the there hasn't been any changes in the collection, beginning with the item with the lowest index and ending with the item with the highest index. Secondly it is possibly to say that an iterator is less then or greater then another iterator.
With this definition a mapping would be an ordered collection, and also almost every collection implementation I can think of that can be iterated over at all.
It's more interesting to differentiate between collections with and without a well defined order. (Actually, most of the time when I've said "ordered" in this conference, I really ment "with well defined order".) I can only see two ways to create a well defined order: Either through keeping the arbitrary order created by the user operations (i.e. like in Sequence), or by having a comparison function.
If there is a well defined order, I don't think the exception "if the there hasn't been any changes in the collection" should be necessary; it's possible to do better than that.
Also note that there are several occasions where an iterator over a collection without a well defined order can have meaningful `< and `> operations, but the exception above then becomes essential. See the discussion starting in 9265430.
int(0..1) '<(OrderedIterator iter) Return 1 if iter is positioned after this iterator, otherwise 0. Throw an error if the two iterators belong to different collections.
This exception handling might be a problem in the same way as in `==, but it's not equally severe here. The best way to solve it would probably be to make a typed exception that the caller can intercept and discard when appropriate.
- Index
/.../
a < b implies !(b < a)
But !(b < a) does not imply a < b, I hope, since that'd rule out partial orders. Another relation I hope doesn't hold is:
!(a < b) && !(b < a) implies a == b
Or alternatively:
a != b implies a < b || b < a
Well, that was quite a lot of comments. Generally speaking, I think you should especially reconsider your designation of the index and value sets; see for instance how has_index() and has_value() operates on arrays and multisets. I think a good definition for them is built around the indexing operation:
o If an `[] operator exists, it's able to check if the given key exists in the collection and, in that case, find the corresponding element. That operation must have a time complexity strictly less than O(the size of the collection).
o The key given to `[] is matched within the set of _indices_ of the elements in the collection.
o The result returned by `[] comes from the set of _values_ of the elements in the collection.
/ Martin Stjernholm, Roxen IS
Previous text:
2002-11-07 13:08: Subject: More about ADTs
I've made som changes and here is a new suggestion for a common interface of the new ADTs.
Common API for new ADTs in Pike
- Introduction
This document describes a common platform for new ADTs in pike to stand on. The purpose of the platform is to make the new ADTs as flexible as possible and possible to use together with generic algorithms. The base of the platform is the Collection concept, being the base concepts of all the ADTs, and the Iterator concept making it possible to iterate over collections without knowing the underlying structure. Besides the two base concepts this document also describes a few sub-concepts that are related to the two base concepts according to the two figures below.
Figur 1 Collection concepts
Collection |-----------> Ordered Collection | | | |--------> Sequence | |--------> Ordered Set <------| | |--------> Ordered Map <--| | |----> Map --------------------------------------| | |----> Set ------------------------------------------|
Figur 2 Interface concepts
Iterator -------> Ordered Iterator
- Collection concepts
2.1. Collection Collection is the base concept of all the ADTs. The reason it is called collection is that all the ADTs in the library is in some way a collection of items. To implement a collection the following methods must be implemented
iterator _get_iterator() Return an iterator for this ADT positioned at the first item in the ADT.
int _sizeof() Return the current size of the ADT. Return -1 if the size could not be decided or is truly unlimited.
array _values() Return an array with the values in the ADT.
int _has_value(mixed value) Returns 1 if value is in the value domain of collection, or 0 (zero) if not found
int _equal(mixed coll) Decide if the collection coll and this collection contains the same elements, an if it is important, in the same order.
string _sprintf(int conversion_type, mapping(string:int)|void params) Present the collection in a human readable way. For more information see _sprintf in the pike manual.
int is_empty() Return 0 if the ADT do has elements.
int max_size() Return the maximal size of the ADT if there is any, otherwise -1.
void clear() Clear the ADT.
iterator first() Return an iterator for this ADT positioned at the first item in the ADT, same as _get_iterator().
iterator last() Return an iterator for this ADT positioned at the last item in the ADT.
2.2. Set Set is a collection containing any number of items but only one of each item. It is a sub-concept of Collection and must thereby besides implementing the methods below also implements the methods in Collections.
int '[](mixed item) Return 1 if the item could be found in the set, 0 if it couldn't
int '[]=(mixed item, int exist) Add the item to the set if exist is true remove the item if exist is false.
iterator get_iterator(mixed item) Get an iterator for this ADT positioned at the item or null if the item could not be found
int insert(mixed item) Add the item to the set if it isn't a part of the set. Return true if the item was added, else false.
mixed delete(mixed item) Remove the item from the set. If the item was removed it will be returned otherwise a 0 zero_type is returned.
multiset cast("multiset") The set should at least be able to cast to a multiset, additional castings are optional.
2.3. Map A map is a data structure that allows you to associate a key with one or more values and then quickly retrieve those values using the key. To implement a Map the ADT must have all the methods in this concept and in the Collection concept.
mixed '[](mixed key) Return the item associated with the key if it could be found, 0 if it couldn't
mixed '[]=(mixed key, mixed item) If there is an item with the key key then the associated item will be overwritten with the supplied item. If there is no key in the map a new key-item pair will be added.
iterator get_iterator(mixed key) Get an iterator for this ADT positioned at the key or null if the key could not be found.
mixed insert(mixed key, mixed item) If the key doesn't exist or duplicates are allowed, associate the value with the key and return 0, otherwise don't modify the map and return the current value associated with the key.
mixed delete(mixed key) Remove the pair with the key key. Return the keys value if it was removed, if the key could not be removed return zero_type.
mapping cast("mapping") The map should at leas be able to cast to a mapping, additional castings are optional.
2.4. OrderedCollection An OrderedCollection is a collection that has indices that could be used to find the values and the indices are strictly ordered is some way. Any type could be used as an index if it satisfies a few rules. See Index for more information. There is also nothing that prevents the use of the values as indices as long as the value satisfies the rules for an Index.
int _has_index(int index) Returns 1 if index is in the index domain of collection, or 0 (zero) if not found
array _indices() Return an array with the indices of the collection.
2.5. Sequence A sequence is a sub-concept of an Ordered Collection. It is a linear collection that allows index-based access and automatically expands to accommodate new elements. A vector is a good example.
mixed '[](int ind) Return the item at index ind in the sequence. Throw an error if ind is out of range.
mixed '[]=(int ind, mixed item) Set the index ind to be item. Throw an error if ind is out of range
int _search(mixed item, void|int start) Return the index of the first item in the sequence with the index start or higher that match the item item. -1 if no object was found. Throw an error if start is out of range.
void insert(index ind, mixed item) Insert an item in the sequence at the index ind. The size of the sequence is increased by one and the items at index and above have their indices increases by one. Throw an error if ind is out of range.
int(0..1) delete(mixed item) Remove the first value that are equal to item from the sequence. Return 1 if a item was removed, otherwise 0;
void add(mixed item) Add the object item as fast as possible to the sequence.
int delete_at(int ind) Remove the item at the index ind. The size of the sequence is decreased by one and the items above ind have their indices decreases by one. Return 0 if the removal failed. Throw an error if ind is out of range.
iterator get_iterator(int ind) Get an iterator for this ADT positioned at the index ind. Throw an error if ind is out of range.
array cast("array") The sequence should at least be able to cast to an array, additional castings are optional.
2.6. OrderedMap An OrderedMap is a map where the keys are orderable and the iterator belonging to an OrderedMap will traverse the values in that order.
2.7. OrderedSet An OrderedSet is a set were the values are orderable and the iterator belonging to an OrderedSet will traverse the values in that order.
- Iterator concepts
3.1. Iterator The iterator concept does not differ much from the present day iterator. Only a few new methods are added to make it possible to step backwards and compare iterators.
int(0 1) '!() Should return 0 (zero) when not at end of collection, and 1 at end of collection
void '+=(int n) Should advance the specified number of steps
iterator '-=(int n) Move the iterator n steps backwards
int '==(iterator iter) Return true if the two iterators is pointing to the same object. Throw an error if the two iterators belong to different collections.
mixed index() Should return the current index.
mixed value() Should return the current value
int has_previous() Return true if the iterator can move one step backwards.
int has_next() Return true if the iterator can move one step forwards. It have the same use as '!().
collection get_collection() Return the collection this iterator iterates over.
Optional void set_value(mixed value) Set the value of this iterator. This method should only be available if it is possible to set the value via an iterator.
Optional void add(mixed value) Add a new item to the collection after the item the iterator currently is pointing at. This method should only be available if it is possible to add entries via the iterator
Optional mixed remove() Remove the item the iterator currently is pointing at and move to the next item. This method should only be available if it is possible to remove entries via the iterator
3.2. OrderedIterator An OrderedIterator is an iterator working on an OrderedCollection. It has two characteristics, first it traverse the collection in the same order each time it is traversed, if the there hasn't been any changes in the collection, beginning with the item with the lowest index and ending with the item with the highest index. Secondly it is possibly to say that an iterator is less then or greater then another iterator.
int(0..1) '<(OrderedIterator iter) Return 1 if iter is positioned after this iterator, otherwise 0. Throw an error if the two iterators belong to different collections.
Int(0..1) '>(OrderedIterator iter) Return 1 if iter is positioned before this iterator otherwise 0. Throw an error if the two iterators belong to different collections.
- Index
If a type could be used as an index in an OrderedCollection then it must be possible to use the following operations on it.
'== If the two indexes a and b is equal it should return 1 otherwise 0. The following rules must also be satisfied. a == a must be true a == b implies b == a !(a == b) implies a != b a == b and b == c implies a == c
'< If the index a is ordered lower than b it should return 1 otherwise 0. The following rules must also be satisfied. a < a must be false a < b implies !(b < a) a < b implies (b > a) a < b and b < c implies a < c
'> If the index a is ordered higher than b it should return 1 otherwise 0. The following rules must also be satisfied. a > a must be false a > b implies !(b > a) a > b implies b < a a > b and b > c implies a > c
/ Peta, jo det är jag