More about ADTs

11 Nov 2002

      ...
I take the presence of this in Collection but not _indices() as a sign
that e.g. the Set class would return its contents with this function.
That doesn't work very well with the current multiset implementation.
If it's possible to do lookup on a value with `[], then it should (by
definition) be part of the index set and not the value set. (The same
applies to _has_value.)
There is nothing that prevents that _indices() should be added to all
types that need it. But I did not made it a demand for being a
collection since there is after all types that does not have indices
that could be a collection. About sets, I regard the content as both
indices and values.
...
Are elements compared with equal() or `==? To be consistent it should
be the first, but there clearly can be situations where one would
rather do the latter.
...
This function should also be optional. It might be possible to define
for some unlimited collections, but far from all of them.
Shouldn't it be `==, wouldn't it be hard to handle basic types as
elements otherwise?
...
I assume you mean _get_iterator. The argument must be optional to make
the interface compatible with Collection.
No I did mean get_iterator since the _get_iterator() didn't take a
position, but making it optional argument instead could be a good solution.
...
Isn't it more useful if it replaces the value in the second case? Then
a Map that doesn't allow multiple indices can simply let `[]= and
insert be the same.
Don't think that an insert should do a replace.
...
How is the range defined? Is it possible to index from the end with
negative integers? If so, how will that work together with unlimited
Sequences?
If the end could not be found it should probably also throw an
error. Another solution is not to allow negative integers.
...
What about being able to set the order in these?
I had a get_comparator method but realised that it will just give
another interface and not be necessary in a minimal base.
...
That won't work well with how the == operator is used. It can be
called between arbitrary things e.g. in the master, inside mappings
etc. It must be able to cope with any argument; it's not even limited
to objects.
...
This is actually a generic problem with the current use of `==, `< and
`>. One would like to have some sort of domain restriction of them,
but there's currently no such thing.
Yes it should be _equal()
...
With this definition a mapping would be an ordered collection, and
also almost every collection implementation I can think of that can be
iterated over at all.
Sorry been reading in the definition of 'strictly ordered' from OrderedCollection when I've been reading this.
...
If there is a well defined order, I don't think the exception "if the
there hasn't been any changes in the collection" should be necessary;
it's possible to do better than that.
There could be collections where you can change the order of the indices.
...
o  If an `[] operator exists, it's able to check if the given key
  exists in the collection and, in that case, find the corresponding
  element. That operation must have a time complexity strictly less
  than O(the size of the collection).
Question do you think I should keep the '[] at all since a for example
a linked list will be O(n) and there will probably be others. After
all the only reason there is '[] is to resemble the existing types
delete, insert and replace will work just as well.
Finally a few words about unlimited/undefined sizes. They are
undoubtedly cause a lot of problems and since one of the goals with
the structure is to make it possible to use the collections and its 
sub-concepts in algorithms and most algorithms are quite useless on 
unlimited and undefined sizes. Therefore I think it's best to not let 
collections be of a unlimited/undefined size.
/ Peta, jo det är jag
Previous text:
...
2002-11-09 02:09:
Subject: More about ADTs

...
int _sizeof()
Return the current size of the ADT. Return -1 if the size could not
be decided or is truly unlimited.
Is there any way to tell the last two cases apart? If not, do you
consider it useless to be able to do that?
...
array _values()
Return an array with the values in the ADT.
I take the presence of this in Collection but not _indices() as a sign
that e.g. the Set class would return its contents with this function.
That doesn't work very well with the current multiset implementation.
If it's possible to do lookup on a value with `[], then it should (by
definition) be part of the index set and not the value set. (The same
applies to _has_value.)
Also, this function has to be optional, or else it's not possible to
have unlimited collections.
...
int _equal(mixed coll)
Decide if the collection coll and this collection contains the same
elements, an if it is important, in the same order.
Are elements compared with equal() or `==? To be consistent it should
be the first, but there clearly can be situations where one would
rather do the latter.
This function should also be optional. It might be possible to define
for some unlimited collections, but far from all of them.
...
iterator last()
Return an iterator for this ADT positioned at the last item in the ADT.
Note that this function can't be implemented for unlimited
collections, so it must be optional.
...
2.2.	Set
/.../
...
iterator get_iterator(mixed item)
Get an iterator for this ADT positioned at the item or null if the
item could not be found
I assume you mean _get_iterator. The argument must be optional to make
the interface compatible with Collection.
...
int insert(mixed item)
/.../
...
mixed delete(mixed item)
/.../
These should probably be lfuns, since they need to be implemented for
the new multisets too. Therefore it would probably be better if they
followed the convention with an initial "_" as the other lfuns.
Actually most, if not all, functions in Collection and inheriting
classes will likely become lfuns sooner or later so this remark
applies in general.
Perhaps some explanation of the lfun design might be in order: They
are used by various global functions and builtin language constructs
to let objects behave like builtin data types. Wherever they are used,
it's only tested whether an identifier with that name exists in the
object, and if it does then it's used under the assumption that it
actually carries out the intended operation; no check is done that the
object inherits some interface that declares the function.
In other words, the existence of the identifier itself is interpreted
as if the object adheres to an interface. This use of identifiers is
something that makes object orientation in Pike in general radically
different from e.g. C++ and Java. The advantage is that it's very
convenient to implement only those parts of an interface that are
necessary to get something to work; to accomplish the same design in
e.g. C++ you'd have to have an abstract class for almost every
function and then inherit a bunch of classes for the functions you
want to implement. The disadvantage is that the identifier namespace
in classes is essentially global in this regard.
I think that the advantage outweights the disadvantage when it comes
to core interfaces like collections. The namespace issue is then
solved either by the use of the backquote operator syntax, or by the
"_" prefix. (The notable exception is "cast", which I consider an
oversight. "create" and "destroy" might also be considered exceptions,
but they are otoh part of the interface to objects themselves and not
how they are used to emulate other data types.)
Note that this argumentation does not apply to iterators since they
are explicitly requested and therefore can't be confused with anything
else, as opposed to e.g. equal() which can be fed (and must work on)
any object.
As for your reasons for not having "_" in front of insert and delete:
...

There is currently no lfun insert and delete.

That's only an argument insofar that there's not any existing
interface to adhere to (the same applies to _search and _has_value,
for that matter). I.e. it's not an argument against using "_", only
one that it doesn't have to be used.
...

I belive that it should be possible to remove and insert
without using and lfun

Of course. All lfuns can be called as ordinary functions too.
...

If there will be a insert and delete lfun then it should not
be difficult to use the current insert and delete methods.

Well, the inconsistent behavior of delete() in Sequence (see below)
currently makes that a bit hard, but I hope you'll change that. I
don't see how this is an argument either for or against the underline
convention.
My argument for using "_" is that if these functions are made into
lfuns (which very likely will happen) then they would collide with
existing functions that doesn't adhere to this interface (just do a
grep through lib/modules to see a couple of examples of that).
...
multiset cast("multiset")
The set should at least be able to cast to a multiset, /.../
Is there a particular reason for that? That means that an unlimited
collection can't be a Set. Besides, I can't see how this operation is
a necessary part of something being a set, abstractly speaking. I
suggest you make it an optional but recommended operation instead.
...
2.3.	Map
A map is a data structure that allows you to associate a key with
one or more values /.../
Afaics this means that the property unique vs duplicate indices is
connected with the property of whether values exist or not. Why?
...
mixed '[](mixed key)
Return the item associated with the key if it could be found, 0 if
it couldn't
0 ought to be UNDEFINED instead (i.e. a 0 for which zero_type returns
1).
...
mixed '[]=(mixed key, mixed item)
If there is an item with the key key then the associated item will
be overwritten with the supplied item. If there is no key in the map
a new key-item pair will be added.
The return value is item?
...
iterator get_iterator(mixed key)
Get an iterator for this ADT positioned at the key or null if the
key could not be found.
This doesn't mention the case when a Map contains duplicate indices.
(Also applies to the other functions that uses a key here, for that
matter.)
...
mixed insert(mixed key, mixed item)
If the key doesn't exist or duplicates are allowed, associate the
value with the key and return 0, otherwise don't modify the map and
return the current value associated with the key.
Isn't it more useful if it replaces the value in the second case? Then
a Map that doesn't allow multiple indices can simply let `[]= and
insert be the same.
...
mixed delete(mixed key)
Remove the pair with the key key. Return the keys value if it was
removed, if the key could not be removed return zero_type.
You probably mean UNDEFINED lastly. (Yes, I'm being picky but in my
experience interface docs better be correct down to the letter.)
...
2.4.	OrderedCollection
An OrderedCollection is a collection that has indices that could be
used to find the values and the indices are strictly ordered is some
way. /.../
int _has_index(int index)
/.../
...
array _indices()
/.../
This does not make sense to me. For me, and for the primitive data
types in Pike, an index is something that can be looked up (in an
operation that is at least faster than O(n), otherwise I'd call it
"searched for"). That has nothing to do with order between indices.
...
2.5.	Sequence
A sequence is a sub-concept of an Ordered Collection. It is a linear
collection that allows index-based access and automatically expands
to accommodate new elements. A vector is a good example.
Odd definition. Isn't the key concept that it has a well defined order
and that the order is _arbitrary_, i.e. that each element has an order
wrt to every other element and that order is specified only by how the
sequence is created and not on any sort of comparison function?
...
mixed '[](int ind)
Return the item at index ind in the sequence. Throw an error if ind
is out of range.
How is the range defined? Is it possible to index from the end with
negative integers? If so, how will that work together with unlimited
Sequences?
...
int _search(mixed item, void|int start)
Return the index of the first item in the sequence with the index
start or higher that match the item item.
Why is this not already in Collection? Note that a (well defined)
order isn't necessary to be able to do this; see how search() works on
mappings, for example. (Ideally one would always use an iterator to do
a search through the values, but then it could not be used with the
current search() function, so I agree that an lfun like this one is
necessary.)
...
[Return] -1 if no object was found.
This doesn't work if this function is extended to Maps (or
OrderedMaps). UNDEFINED is a better return value to get a consistent
interface with those.
...
int(0..1) delete(mixed item)
Remove the first value that are equal to item from the sequence.
Return 1 if a item was removed, otherwise 0;
This is inconsistent with delete() in Set and Map, which takes an
index. Now it takes a value all of a sudden. I don't think this
operation is necessary at all; then there could just as well be a
variant of insert() that searches for a value and inserts another one
before/after it.
...
int delete_at(int ind)
Remove the item at the index ind. /.../
This is the one that ought to be named delete(), I think.
...
Return 0 if the removal failed.
Why would it fail, given that the index is in range?
...
2.6.	OrderedMap
/.../
...
2.7.	OrderedSet
/.../
What about being able to set the order in these?
I miss functions to get an iterator based on inequalities, i.e.
functions that returns an iterator pointing to the closest element
less/greater (or equal) to a given key.
I also miss _search, but otoh I think that one should be present
already in Collection.
...
3.1.	Iterator
/.../
...
int '==(iterator iter)
/.../ Throw an error if the two iterators belong to different collections.
That won't work well with how the == operator is used. It can be
called between arbitrary things e.g. in the master, inside mappings
etc. It must be able to cope with any argument; it's not even limited
to objects.
This is actually a generic problem with the current use of `==, `< and
`>. One would like to have some sort of domain restriction of them,
but there's currently no such thing.
I miss operations to be able to search in either direction, e.g. a
search_eq_forward which steps the iterator forward until a value that
is equal with the argument according to `==. A more general variant
might be a function that takes a predicate function to do the
matching. It could also be useful with iterator variants of map() and
filter(), i.e. map_forward, filter_backward etc.
...
3.2.	OrderedIterator
...
An OrderedIterator is an iterator working on an OrderedCollection.
It has two characteristics, first it traverse the collection in the
same order each time it is traversed, if the there hasn't been any
changes in the collection, beginning with the item with the lowest
index and ending with the item with the highest index. Secondly it
is possibly to say that an iterator is less then or greater then
another iterator.
With this definition a mapping would be an ordered collection, and
also almost every collection implementation I can think of that can be
iterated over at all.
It's more interesting to differentiate between collections with and
without a well defined order. (Actually, most of the time when I've
said "ordered" in this conference, I really ment "with well defined
order".) I can only see two ways to create a well defined order:
Either through keeping the arbitrary order created by the user
operations (i.e. like in Sequence), or by having a comparison
function.
If there is a well defined order, I don't think the exception "if the
there hasn't been any changes in the collection" should be necessary;
it's possible to do better than that.
Also note that there are several occasions where an iterator over a
collection without a well defined order can have meaningful `< and `>
operations, but the exception above then becomes essential. See the
discussion starting in 9265430.
...
int(0..1) '<(OrderedIterator iter)
Return 1 if iter is positioned after this iterator, otherwise 0.
Throw an error if the two iterators belong to different collections.
This exception handling might be a problem in the same way as in `==,
but it's not equally severe here. The best way to solve it would
probably be to make a typed exception that the caller can intercept
and discard when appropriate.
...

Index

/.../
...
a < b implies !(b < a)
But !(b < a) does not imply a < b, I hope, since that'd rule out
partial orders. Another relation I hope doesn't hold is:
!(a < b) && !(b < a) implies a == b
Or alternatively:
a != b implies a < b || b < a
Well, that was quite a lot of comments. Generally speaking, I think
you should especially reconsider your designation of the index and
value sets; see for instance how has_index() and has_value() operates
on arrays and multisets. I think a good definition for them is built
around the indexing operation:
o  If an `[] operator exists, it's able to check if the given key
  exists in the collection and, in that case, find the corresponding
  element. That operation must have a time complexity strictly less
  than O(the size of the collection).
o  The key given to `[] is matched within the set of _indices_ of the
  elements in the collection.
o  The result returned by `[] comes from the set of _values_ of the
  elements in the collection.
/ Martin Stjernholm, Roxen IS

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

More about ADTs