More about ADTs

9 Nov 2002

      ...
int _sizeof()
Return the current size of the ADT. Return -1 if the size could not
be decided or is truly unlimited.
Is there any way to tell the last two cases apart? If not, do you
consider it useless to be able to do that?
...
array _values()
Return an array with the values in the ADT.
I take the presence of this in Collection but not _indices() as a sign
that e.g. the Set class would return its contents with this function.
That doesn't work very well with the current multiset implementation.
If it's possible to do lookup on a value with `[], then it should (by
definition) be part of the index set and not the value set. (The same
applies to _has_value.)
Also, this function has to be optional, or else it's not possible to
have unlimited collections.
...
int _equal(mixed coll)
Decide if the collection coll and this collection contains the same
elements, an if it is important, in the same order.
Are elements compared with equal() or `==? To be consistent it should
be the first, but there clearly can be situations where one would
rather do the latter.
This function should also be optional. It might be possible to define
for some unlimited collections, but far from all of them.
...
iterator last()
Return an iterator for this ADT positioned at the last item in the ADT.
Note that this function can't be implemented for unlimited
collections, so it must be optional.
...
2.2.	Set
/.../
...
iterator get_iterator(mixed item)
Get an iterator for this ADT positioned at the item or null if the
item could not be found
I assume you mean _get_iterator. The argument must be optional to make
the interface compatible with Collection.
...
int insert(mixed item)
/.../
...
mixed delete(mixed item)
/.../
These should probably be lfuns, since they need to be implemented for
the new multisets too. Therefore it would probably be better if they
followed the convention with an initial "_" as the other lfuns.
Actually most, if not all, functions in Collection and inheriting
classes will likely become lfuns sooner or later so this remark
applies in general.
Perhaps some explanation of the lfun design might be in order: They
are used by various global functions and builtin language constructs
to let objects behave like builtin data types. Wherever they are used,
it's only tested whether an identifier with that name exists in the
object, and if it does then it's used under the assumption that it
actually carries out the intended operation; no check is done that the
object inherits some interface that declares the function.
In other words, the existence of the identifier itself is interpreted
as if the object adheres to an interface. This use of identifiers is
something that makes object orientation in Pike in general radically
different from e.g. C++ and Java. The advantage is that it's very
convenient to implement only those parts of an interface that are
necessary to get something to work; to accomplish the same design in
e.g. C++ you'd have to have an abstract class for almost every
function and then inherit a bunch of classes for the functions you
want to implement. The disadvantage is that the identifier namespace
in classes is essentially global in this regard.
I think that the advantage outweights the disadvantage when it comes
to core interfaces like collections. The namespace issue is then
solved either by the use of the backquote operator syntax, or by the
"_" prefix. (The notable exception is "cast", which I consider an
oversight. "create" and "destroy" might also be considered exceptions,
but they are otoh part of the interface to objects themselves and not
how they are used to emulate other data types.)
Note that this argumentation does not apply to iterators since they
are explicitly requested and therefore can't be confused with anything
else, as opposed to e.g. equal() which can be fed (and must work on)
any object.
As for your reasons for not having "_" in front of insert and delete:
...

There is currently no lfun insert and delete.

That's only an argument insofar that there's not any existing
interface to adhere to (the same applies to _search and _has_value,
for that matter). I.e. it's not an argument against using "_", only
one that it doesn't have to be used.
...

I belive that it should be possible to remove and insert
without using and lfun

Of course. All lfuns can be called as ordinary functions too.
...

If there will be a insert and delete lfun then it should not
be difficult to use the current insert and delete methods.

Well, the inconsistent behavior of delete() in Sequence (see below)
currently makes that a bit hard, but I hope you'll change that. I
don't see how this is an argument either for or against the underline
convention.
My argument for using "_" is that if these functions are made into
lfuns (which very likely will happen) then they would collide with
existing functions that doesn't adhere to this interface (just do a
grep through lib/modules to see a couple of examples of that).
...
multiset cast("multiset")
The set should at least be able to cast to a multiset, /.../
Is there a particular reason for that? That means that an unlimited
collection can't be a Set. Besides, I can't see how this operation is
a necessary part of something being a set, abstractly speaking. I
suggest you make it an optional but recommended operation instead.
...
2.3.	Map
A map is a data structure that allows you to associate a key with
one or more values /.../
Afaics this means that the property unique vs duplicate indices is
connected with the property of whether values exist or not. Why?
...
mixed '[](mixed key)
Return the item associated with the key if it could be found, 0 if
it couldn't
0 ought to be UNDEFINED instead (i.e. a 0 for which zero_type returns
1).
...
mixed '[]=(mixed key, mixed item)
If there is an item with the key key then the associated item will
be overwritten with the supplied item. If there is no key in the map
a new key-item pair will be added.
The return value is item?
...
iterator get_iterator(mixed key)
Get an iterator for this ADT positioned at the key or null if the
key could not be found.
This doesn't mention the case when a Map contains duplicate indices.
(Also applies to the other functions that uses a key here, for that
matter.)
...
mixed insert(mixed key, mixed item)
If the key doesn't exist or duplicates are allowed, associate the
value with the key and return 0, otherwise don't modify the map and
return the current value associated with the key.
Isn't it more useful if it replaces the value in the second case? Then
a Map that doesn't allow multiple indices can simply let `[]= and
insert be the same.
...
mixed delete(mixed key)
Remove the pair with the key key. Return the keys value if it was
removed, if the key could not be removed return zero_type.
You probably mean UNDEFINED lastly. (Yes, I'm being picky but in my
experience interface docs better be correct down to the letter.)
...
2.4.	OrderedCollection
An OrderedCollection is a collection that has indices that could be
used to find the values and the indices are strictly ordered is some
way. /.../
int _has_index(int index)
/.../
...
array _indices()
/.../
This does not make sense to me. For me, and for the primitive data
types in Pike, an index is something that can be looked up (in an
operation that is at least faster than O(n), otherwise I'd call it
"searched for"). That has nothing to do with order between indices.
...
2.5.	Sequence
A sequence is a sub-concept of an Ordered Collection. It is a linear
collection that allows index-based access and automatically expands
to accommodate new elements. A vector is a good example.
Odd definition. Isn't the key concept that it has a well defined order
and that the order is _arbitrary_, i.e. that each element has an order
wrt to every other element and that order is specified only by how the
sequence is created and not on any sort of comparison function?
...
mixed '[](int ind)
Return the item at index ind in the sequence. Throw an error if ind
is out of range.
How is the range defined? Is it possible to index from the end with
negative integers? If so, how will that work together with unlimited
Sequences?
...
int _search(mixed item, void|int start)
Return the index of the first item in the sequence with the index
start or higher that match the item item.
Why is this not already in Collection? Note that a (well defined)
order isn't necessary to be able to do this; see how search() works on
mappings, for example. (Ideally one would always use an iterator to do
a search through the values, but then it could not be used with the
current search() function, so I agree that an lfun like this one is
necessary.)
...
[Return] -1 if no object was found.
This doesn't work if this function is extended to Maps (or
OrderedMaps). UNDEFINED is a better return value to get a consistent
interface with those.
...
int(0..1) delete(mixed item)
Remove the first value that are equal to item from the sequence.
Return 1 if a item was removed, otherwise 0;
This is inconsistent with delete() in Set and Map, which takes an
index. Now it takes a value all of a sudden. I don't think this
operation is necessary at all; then there could just as well be a
variant of insert() that searches for a value and inserts another one
before/after it.
...
int delete_at(int ind)
Remove the item at the index ind. /.../
This is the one that ought to be named delete(), I think.
...
Return 0 if the removal failed.
Why would it fail, given that the index is in range?
...
2.6.	OrderedMap
/.../
...
2.7.	OrderedSet
/.../
What about being able to set the order in these?
I miss functions to get an iterator based on inequalities, i.e.
functions that returns an iterator pointing to the closest element
less/greater (or equal) to a given key.
I also miss _search, but otoh I think that one should be present
already in Collection.
...
3.1.	Iterator
/.../
...
int '==(iterator iter)
/.../ Throw an error if the two iterators belong to different collections.
That won't work well with how the == operator is used. It can be
called between arbitrary things e.g. in the master, inside mappings
etc. It must be able to cope with any argument; it's not even limited
to objects.
This is actually a generic problem with the current use of `==, `< and
`>. One would like to have some sort of domain restriction of them,
but there's currently no such thing.
I miss operations to be able to search in either direction, e.g. a
search_eq_forward which steps the iterator forward until a value that
is equal with the argument according to `==. A more general variant
might be a function that takes a predicate function to do the
matching. It could also be useful with iterator variants of map() and
filter(), i.e. map_forward, filter_backward etc.
...
3.2.	OrderedIterator
...
An OrderedIterator is an iterator working on an OrderedCollection.
It has two characteristics, first it traverse the collection in the
same order each time it is traversed, if the there hasn't been any
changes in the collection, beginning with the item with the lowest
index and ending with the item with the highest index. Secondly it
is possibly to say that an iterator is less then or greater then
another iterator.
With this definition a mapping would be an ordered collection, and
also almost every collection implementation I can think of that can be
iterated over at all.
It's more interesting to differentiate between collections with and
without a well defined order. (Actually, most of the time when I've
said "ordered" in this conference, I really ment "with well defined
order".) I can only see two ways to create a well defined order:
Either through keeping the arbitrary order created by the user
operations (i.e. like in Sequence), or by having a comparison
function.
If there is a well defined order, I don't think the exception "if the
there hasn't been any changes in the collection" should be necessary;
it's possible to do better than that.
Also note that there are several occasions where an iterator over a
collection without a well defined order can have meaningful `< and `>
operations, but the exception above then becomes essential. See the
discussion starting in 9265430.
...
int(0..1) '<(OrderedIterator iter)
Return 1 if iter is positioned after this iterator, otherwise 0.
Throw an error if the two iterators belong to different collections.
This exception handling might be a problem in the same way as in `==,
but it's not equally severe here. The best way to solve it would
probably be to make a typed exception that the caller can intercept
and discard when appropriate.
...

Index

/.../
...
a < b implies !(b < a)
But !(b < a) does not imply a < b, I hope, since that'd rule out
partial orders. Another relation I hope doesn't hold is:
!(a < b) && !(b < a) implies a == b
Or alternatively:
a != b implies a < b || b < a
Well, that was quite a lot of comments. Generally speaking, I think
you should especially reconsider your designation of the index and
value sets; see for instance how has_index() and has_value() operates
on arrays and multisets. I think a good definition for them is built
around the indexing operation:
o  If an `[] operator exists, it's able to check if the given key
   exists in the collection and, in that case, find the corresponding
   element. That operation must have a time complexity strictly less
   than O(the size of the collection).
o  The key given to `[] is matched within the set of _indices_ of the
   elements in the collection.
o  The result returned by `[] comes from the set of _values_ of the
   elements in the collection.
/ Martin Stjernholm, Roxen IS
Previous text:
...
2002-11-07 13:08:
Subject: More about ADTs

I've made som changes and here is a new suggestion for a common
interface of the new ADTs.
Common API for new ADTs in Pike

Introduction

This document describes a common platform for new ADTs in pike to stand on. The purpose of the platform is to make the new ADTs as flexible as possible and possible to use together with generic algorithms. The base of the platform is the Collection concept, being the base concepts of all the ADTs, and the Iterator concept making it possible to iterate over collections without knowing the underlying structure.
Besides the two base concepts this document also describes a few sub-concepts that are related to the two base concepts according to the two figures below.
Figur 1	Collection concepts
Collection
 |-----------> Ordered Collection
 |                      |
 |                      |--------> Sequence
 |                      |--------> Ordered Set <------|
 |                      |--------> Ordered Map <--|   |
 |----> Map --------------------------------------|   |
 |----> Set ------------------------------------------|
Figur 2	Interface concepts
Iterator -------> Ordered Iterator

Collection concepts

2.1.	Collection
Collection is the base concept of all the ADTs. The reason it is called collection is that all the ADTs in the library is in some way a collection of items. To implement a collection the following methods must be implemented
iterator _get_iterator()
Return an iterator for this ADT positioned at the first item in the ADT.
int _sizeof()
Return the current size of the ADT. Return -1 if the size could not be decided or is truly unlimited.
array _values()
Return an array with the values in the ADT.
int _has_value(mixed value)
Returns 1 if value is in the value domain of collection, or 0 (zero) if not found
int _equal(mixed coll)
Decide if the collection coll and this collection contains the same elements, an if it is important, in the same order.
string _sprintf(int conversion_type, mapping(string:int)|void params)
Present the collection in a human readable way. For more information see _sprintf in the pike manual.
int is_empty()
Return 0 if the ADT do has elements.
int max_size()
Return the maximal size of the ADT if there is any, otherwise -1.
void clear()
Clear the ADT.
iterator first()
Return an iterator for this ADT positioned at the first item in the ADT, same as _get_iterator().
iterator last()
Return an iterator for this ADT positioned at the last item in the ADT.
2.2.	Set
Set is a collection containing any number of items but only one of each item. It is a sub-concept of Collection and must thereby besides implementing the methods below also implements the methods in Collections.
int '[](mixed item)
Return 1 if the item could be found in the set, 0 if it couldn't
int '[]=(mixed item, int exist)
Add the item to the set if exist is true remove the item if exist is false.
iterator get_iterator(mixed item)
Get an iterator for this ADT positioned at the item or null if the item could not be found
int insert(mixed item)
Add the item to the set if it isn't a part of the set. Return true if the item was added, else false.
mixed delete(mixed item)
Remove the item from the set. If the item was removed it will be returned otherwise a 0 zero_type is returned.
multiset cast("multiset")
The set should at least be able to cast to a multiset, additional castings are optional.
2.3.	Map
A map is a data structure that allows you to associate a key with one or more values and then quickly retrieve those values using the key. To implement a Map the ADT must have all the methods in this concept and in the Collection concept.
mixed '[](mixed key)
Return the item associated with the key if it could be found, 0 if it couldn't
mixed '[]=(mixed key, mixed item)
If there is an item with the key key then the associated item will be overwritten with the supplied item. If there is no key in the map a new key-item pair will be added.
iterator get_iterator(mixed key)
Get an iterator for this ADT positioned at the key or null if the key could not be found.
mixed insert(mixed key, mixed item)
If the key doesn't exist or duplicates are allowed, associate the value with the key and return 0, otherwise don't modify the map and return the current value associated with the key.
mixed delete(mixed key)
Remove the pair with the key key. Return the keys value if it was removed, if the key could not be removed return zero_type.
mapping cast("mapping")
The map should at leas be able to cast to a mapping, additional castings are optional.
2.4.	OrderedCollection
An OrderedCollection is a collection that has indices that could be used to find the values and the indices are strictly ordered is some way. Any type could be used as an index if it satisfies a few rules. See Index for more information. There is also nothing that prevents the use of the values as indices as long as the value satisfies the rules for an Index.
int _has_index(int index)
Returns 1 if index is in the index domain of collection, or 0 (zero) if not found
array _indices()
Return an array with the indices of the collection.
2.5.	Sequence
A sequence is a sub-concept of an Ordered Collection. It is a linear collection that allows index-based access and automatically expands to accommodate new elements. A vector is a good example.
mixed '[](int ind)
Return the item at index ind in the sequence. Throw an error if ind is out of range.
mixed '[]=(int ind, mixed item)
Set the index ind to be item. Throw an error if ind is out of range
int _search(mixed item, void|int start)
Return the index of the first item in the sequence with the index start or higher that match the item item. -1 if no object was found. Throw an error if start is out of range.
void insert(index ind, mixed item)
Insert an item in the sequence at the index ind. The size of the sequence is increased by one and the items at index and above have their indices increases by one. Throw an error if ind is out of range.
int(0..1) delete(mixed item)
Remove the first value that are equal to item from the sequence. Return 1 if a item was removed, otherwise 0;
void add(mixed item)
Add the object item as fast as possible to the sequence.
int delete_at(int ind)
Remove the item at the index ind. The size of the sequence is decreased by one and the items above ind have their indices decreases by one. Return 0 if the removal failed. Throw an error if ind is out of range.
iterator get_iterator(int ind)
Get an iterator for this ADT positioned at the index ind. Throw an error if ind is out of range.
array cast("array")
The sequence should at least be able to cast to an array, additional castings are optional.
2.6.	OrderedMap
An OrderedMap is a map where the keys are orderable and the iterator belonging to an OrderedMap will traverse the values in that order.
2.7.	OrderedSet
An OrderedSet is a set were the values are orderable and the iterator belonging to an OrderedSet will traverse the values in that order.

Iterator concepts

3.1.	Iterator
The iterator concept does not differ much from the present day iterator. Only a few new methods are added to make it possible to step backwards and compare iterators.
int(01) '!()
Should return 0 (zero) when not at end of collection, and 1 at end of collection
void '+=(int n)
Should advance the specified number of steps
iterator '-=(int n)
Move the iterator n steps backwards
int '==(iterator iter)
Return true if the two iterators is pointing to the same object. Throw an error if the two iterators belong to different collections.
mixed index()
Should return the current index.
mixed value()
Should return the current value
int has_previous()
Return true if the iterator can move one step backwards.
int has_next()
Return true if the iterator can move one step forwards. It have the same use as '!().
collection get_collection()
Return the collection this iterator iterates over.
Optional void set_value(mixed value)
Set the value of this iterator. This method should only be available if it is possible to set the value via an iterator.
Optional void add(mixed value)
Add a new item to the collection after the item the iterator currently is pointing at. This method should only be available if it is possible to add entries via the iterator
Optional mixed remove()
Remove the item the iterator currently is pointing at and move to the next item. This method should only be available if it is possible to remove entries via the iterator
3.2.	OrderedIterator
An OrderedIterator is an iterator working on an OrderedCollection. It has two characteristics, first it traverse the collection in the same order each time it is traversed, if the there hasn't been any changes in the collection, beginning with the item with the lowest index and ending with the item with the highest index. Secondly it is possibly to say that an iterator is less then or greater then another iterator.
int(0..1) '<(OrderedIterator iter)
Return 1 if iter is positioned after this iterator, otherwise 0. Throw an error if the two iterators belong to different collections.
Int(0..1) '>(OrderedIterator iter)
Return 1 if iter is positioned before this iterator otherwise 0. Throw an error if the two iterators belong to different collections.

Index

If a type could be used as an index in an OrderedCollection then it must be possible to use the following operations on it.
'==
If the two indexes a and b is equal it should return 1 otherwise 0. The following rules must also be satisfied.
   a == a must be true
   a == b implies b == a
   !(a == b) implies a != b
   a == b and b == c implies a == c
'<
If the index a is ordered lower than b it should return 1 otherwise 0. The following rules must also be satisfied.
   a < a must be false
   a < b implies !(b < a)
   a < b implies (b > a)
   a < b and b < c implies a < c
'>
If the index a is ordered higher than b it should return 1 otherwise 0. The following rules must also be satisfied.
   a > a must be false
   a > b implies !(b > a)
   a > b implies b < a
   a > b and b > c implies a > c
/ Peta, jo det är jag

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

More about ADTs