I'm currently working on a thesis that should bring Pike some new ADTs. I'm currently working on a common interface for the new ADTs and this is a draft of how it might be. The goal of the interface is resemble as much as possible of the current types array, multiset and mapping. Another goal with the interface is to keep it as minimal as possible. No unnecessary methods. The reason is simple, lack of time. The time I've got on hand is far to little to do a trough out analysis of possible fields of application and methods in the ADTs. The solution is to keep the interface incomplete by sufficient to work with a few algorithms and let other add new methods when they develop new algorithms. The base of the interface is four different concepts, one base concept and three sub-concepts that inherit all the methods from the base concept.
Collection |---- Sequence |---- Set |---- Map
Additional sub-concepts to the sub-concepts could be added later, for example sorted set. If an ADT implements all the methods in a concept and its super concepts it is an implementation of the concept.
Collection Collection is the base concept of all the ADTs. The reason it is called collection is that all the ADTs in the library is in some way a collection of items. iterator _get_iterator() Return an iterator for this ADT positioned at the first item in the ADT.
int _sizeof() Return the current size of the ADT.
array _values() Return an array with the values in the ADT.
int is_empty() Return 0 if the ADT do has elements.
int max_size() Return the maximal size of the ADT if there is any, otherwise -1.
void clear() Clear the ADT.
void add_all(collection col) Add all the items in the collection col to this collection.
Sequence A sequence is a linear container that allows index-based access and automatically expands to accommodate new elements. A vector is a good example.
void insert(int ind, mixed item) Insert an item in the sequence at the index ind. The size of the sequence is increased by one and the items at index and above have their indices increases by one. Throw an error if ind is out of range.
int remove(mixed item) Remove the item item from the sequence. Return the number of removed items.
int index_of(int ind, mixed item) Return the index of the first item in the sequence with the index ind or higher that mach the item item. -1 if no object was found. Throw an error if ind is out of range.
void add(mixed item) Add the object item as fast as possible to the sequence.
Int remove_at(int ind) Remove the item at the index ind. The size of the sequence is decreased by one and the items above ind have their indices decreases by one. Return 0 if the removal failed. Throw an error if ind is out of range.
iterator get_iterator(int ind) Get an iterator for this ADT positioned at the index ind. Throw an error if ind is out of range.
mixed '[](int ind) Return the item at index ind in the sequence. Throw an error if ind is out of range.
Mixed '[]=(int ind, mixed item) Set the index ind to be item. Throw an error if ind is out of range
Set Set is a collection containing any number of items but only one of each item.
iterator get_iterator(mixed item) Get an iterator for this ADT positioned at the item or null if the item could not be found.
int '[](set set, mixed item) Return 1 if the item could be found in the set, 0 if it couldn't
int '[]=(mixed item, int exist) Add the item to the set if exist is true remove the item if exist is false.
Map A map is a data structure that allows you to associate a key with one or more values and then quickly retrieve those values using the key.
array _indices() Return an array with the indices in the map.
iterator get_iterator(mixed key) Get an iterator for this ADT positioned at the key or null if the key could not be found.
int remove(mixed key) Remove the pair with the key key. Return 0 if the key could not be removed.
mixed '[](mixed key) Return the item associated with the key if it could be found, 0 if it couldn't
mixed '[]=(mixed key, mixed item) If there is an item with the key key then the associated item will be overwritten with the supplied item. If there is no key in the map a new key-item pair will be added.
Iterator changes I've also analysed the current iterators and how well they would work with generic algorithms and I found that it is only one thing that really lacks, the possibility to walk backwards. Actually the possibility to walk backwards is already there, it just doesn't have it's own method.
iterator '-=(int n) Move the iterator n steps backwards
int has_previous() Return true if the iterator can move one step backwards.
int has_next() Return true if the iterator can move one step forwards. It have the same use as '! but since there is a has_previous I believe it should be a has_next to.
int '==(iterator iter) Return true if the two iterators is pointing to the same object.
Well this is briefly what I have in mind. If you have any questions or comments you're more then welcome.
Um, that seems a lot like plain wrappers around existing types, so I'm curious: what's the intended benefit of using this wrapper interface instead of the existing types?
/ Leif Stensson, Lysator
Previous text:
2002-10-28 15:45: Subject: Proposal for new ADT interface
I'm currently working on a thesis that should bring Pike some new ADTs. I'm currently working on a common interface for the new ADTs and this is a draft of how it might be. The goal of the interface is resemble as much as possible of the current types array, multiset and mapping. Another goal with the interface is to keep it as minimal as possible. No unnecessary methods. The reason is simple, lack of time. The time I've got on hand is far to little to do a trough out analysis of possible fields of application and methods in the ADTs. The solution is to keep the interface incomplete by sufficient to work with a few algorithms and let other add new methods when they develop new algorithms. The base of the interface is four different concepts, one base concept and three sub-concepts that inherit all the methods from the base concept.
Collection |---- Sequence |---- Set |---- Map
Additional sub-concepts to the sub-concepts could be added later, for example sorted set. If an ADT implements all the methods in a concept and its super concepts it is an implementation of the concept.
Collection Collection is the base concept of all the ADTs. The reason it is called collection is that all the ADTs in the library is in some way a collection of items. iterator _get_iterator() Return an iterator for this ADT positioned at the first item in the ADT.
int _sizeof() Return the current size of the ADT.
array _values() Return an array with the values in the ADT.
int is_empty() Return 0 if the ADT do has elements.
int max_size() Return the maximal size of the ADT if there is any, otherwise -1.
void clear() Clear the ADT.
void add_all(collection col) Add all the items in the collection col to this collection.
Sequence A sequence is a linear container that allows index-based access and automatically expands to accommodate new elements. A vector is a good example.
void insert(int ind, mixed item) Insert an item in the sequence at the index ind. The size of the sequence is increased by one and the items at index and above have their indices increases by one. Throw an error if ind is out of range.
int remove(mixed item) Remove the item item from the sequence. Return the number of removed items.
int index_of(int ind, mixed item) Return the index of the first item in the sequence with the index ind or higher that mach the item item. -1 if no object was found. Throw an error if ind is out of range.
void add(mixed item) Add the object item as fast as possible to the sequence.
Int remove_at(int ind) Remove the item at the index ind. The size of the sequence is decreased by one and the items above ind have their indices decreases by one. Return 0 if the removal failed. Throw an error if ind is out of range.
iterator get_iterator(int ind) Get an iterator for this ADT positioned at the index ind. Throw an error if ind is out of range.
mixed '[](int ind) Return the item at index ind in the sequence. Throw an error if ind is out of range.
Mixed '[]=(int ind, mixed item) Set the index ind to be item. Throw an error if ind is out of range
Set Set is a collection containing any number of items but only one of each item.
iterator get_iterator(mixed item) Get an iterator for this ADT positioned at the item or null if the item could not be found.
int '[](set set, mixed item) Return 1 if the item could be found in the set, 0 if it couldn't
int '[]=(mixed item, int exist) Add the item to the set if exist is true remove the item if exist is false.
Map A map is a data structure that allows you to associate a key with one or more values and then quickly retrieve those values using the key.
array _indices() Return an array with the indices in the map.
iterator get_iterator(mixed key) Get an iterator for this ADT positioned at the key or null if the key could not be found.
int remove(mixed key) Remove the pair with the key key. Return 0 if the key could not be removed.
mixed '[](mixed key) Return the item associated with the key if it could be found, 0 if it couldn't
mixed '[]=(mixed key, mixed item) If there is an item with the key key then the associated item will be overwritten with the supplied item. If there is no key in the map a new key-item pair will be added.
Iterator changes I've also analysed the current iterators and how well they would work with generic algorithms and I found that it is only one thing that really lacks, the possibility to walk backwards. Actually the possibility to walk backwards is already there, it just doesn't have it's own method.
iterator '-=(int n) Move the iterator n steps backwards
int has_previous() Return true if the iterator can move one step backwards.
int has_next() Return true if the iterator can move one step forwards. It have the same use as '! but since there is a has_previous I believe it should be a has_next to.
int '==(iterator iter) Return true if the two iterators is pointing to the same object.
Well this is briefly what I have in mind. If you have any questions or comments you're more then welcome.
/ Peta, jo det är jag
I think the addition of a has_value and a has_index would be a good addition to the proposal. (Or possibly _has_value/_has_index, if those are the real names of the LFUNs.)
/ Johan Sundström (ärkehertig av Lysators rootgrupp)
Previous text:
2002-10-28 15:45: Subject: Proposal for new ADT interface
I'm currently working on a thesis that should bring Pike some new ADTs. I'm currently working on a common interface for the new ADTs and this is a draft of how it might be. The goal of the interface is resemble as much as possible of the current types array, multiset and mapping. Another goal with the interface is to keep it as minimal as possible. No unnecessary methods. The reason is simple, lack of time. The time I've got on hand is far to little to do a trough out analysis of possible fields of application and methods in the ADTs. The solution is to keep the interface incomplete by sufficient to work with a few algorithms and let other add new methods when they develop new algorithms. The base of the interface is four different concepts, one base concept and three sub-concepts that inherit all the methods from the base concept.
Collection |---- Sequence |---- Set |---- Map
Additional sub-concepts to the sub-concepts could be added later, for example sorted set. If an ADT implements all the methods in a concept and its super concepts it is an implementation of the concept.
Collection Collection is the base concept of all the ADTs. The reason it is called collection is that all the ADTs in the library is in some way a collection of items. iterator _get_iterator() Return an iterator for this ADT positioned at the first item in the ADT.
int _sizeof() Return the current size of the ADT.
array _values() Return an array with the values in the ADT.
int is_empty() Return 0 if the ADT do has elements.
int max_size() Return the maximal size of the ADT if there is any, otherwise -1.
void clear() Clear the ADT.
void add_all(collection col) Add all the items in the collection col to this collection.
Sequence A sequence is a linear container that allows index-based access and automatically expands to accommodate new elements. A vector is a good example.
void insert(int ind, mixed item) Insert an item in the sequence at the index ind. The size of the sequence is increased by one and the items at index and above have their indices increases by one. Throw an error if ind is out of range.
int remove(mixed item) Remove the item item from the sequence. Return the number of removed items.
int index_of(int ind, mixed item) Return the index of the first item in the sequence with the index ind or higher that mach the item item. -1 if no object was found. Throw an error if ind is out of range.
void add(mixed item) Add the object item as fast as possible to the sequence.
Int remove_at(int ind) Remove the item at the index ind. The size of the sequence is decreased by one and the items above ind have their indices decreases by one. Return 0 if the removal failed. Throw an error if ind is out of range.
iterator get_iterator(int ind) Get an iterator for this ADT positioned at the index ind. Throw an error if ind is out of range.
mixed '[](int ind) Return the item at index ind in the sequence. Throw an error if ind is out of range.
Mixed '[]=(int ind, mixed item) Set the index ind to be item. Throw an error if ind is out of range
Set Set is a collection containing any number of items but only one of each item.
iterator get_iterator(mixed item) Get an iterator for this ADT positioned at the item or null if the item could not be found.
int '[](set set, mixed item) Return 1 if the item could be found in the set, 0 if it couldn't
int '[]=(mixed item, int exist) Add the item to the set if exist is true remove the item if exist is false.
Map A map is a data structure that allows you to associate a key with one or more values and then quickly retrieve those values using the key.
array _indices() Return an array with the indices in the map.
iterator get_iterator(mixed key) Get an iterator for this ADT positioned at the key or null if the key could not be found.
int remove(mixed key) Remove the pair with the key key. Return 0 if the key could not be removed.
mixed '[](mixed key) Return the item associated with the key if it could be found, 0 if it couldn't
mixed '[]=(mixed key, mixed item) If there is an item with the key key then the associated item will be overwritten with the supplied item. If there is no key in the map a new key-item pair will be added.
Iterator changes I've also analysed the current iterators and how well they would work with generic algorithms and I found that it is only one thing that really lacks, the possibility to walk backwards. Actually the possibility to walk backwards is already there, it just doesn't have it's own method.
iterator '-=(int n) Move the iterator n steps backwards
int has_previous() Return true if the iterator can move one step backwards.
int has_next() Return true if the iterator can move one step forwards. It have the same use as '! but since there is a has_previous I believe it should be a has_next to.
int '==(iterator iter) Return true if the two iterators is pointing to the same object.
Well this is briefly what I have in mind. If you have any questions or comments you're more then welcome.
/ Peta, jo det är jag
It's a good goal to resemble the builtin types. I've therefore pointed out several cases where you don't use the standard callbacks (usually called "lfuns") that exists already.
Collection:
o You might want to consider handling collections that aren't bounded, e.g. they have an iterator that enumerates elements from some outside source whose size isn't known in advance, or that is unlimited. I suggest allowing the return value -1 from _sizeof() to flag this, and perhaps even have another return value to flag whether the collection is truly unlimited or only of (currently) unknown size.(*)
Handling collections with unknown size probably has consequences elsewhere too, but it is a very useful application, so it would be nice to include it in the design from the start.
o add_all() - A cast, `+ or `+= are more natural ways to perform this in Pike.
o An _equal() lfun would be useful to be able to do recursive comparison with equal().
o In the same vein, perhaps _encode() and _decode() should be mentioned too? (They're used for serialization with encode_value and decode_value). Maybe also _sprintf()?
Sequence:
o insert() - For the new multisets I'd like to add a global m_insert(), corresponding to the m_delete() that already exists. An lfun like this one would then be used, so this should be that lfun.
If the correspondence to m_delete should be kept, that lfun should then be called _m_insert. The problem with that is mainly that the "m" part is ugly and out of place in this ADT generalization.
o index_of() - This functionality should be accessed through the global search() function for consistence. It currently have no lfun to handle objects, but a _search should clearly be added for that, imho.
o remove_at() - This is the _m_delete lfun.
o add() - Why here and not in Collection? Collection got add_all, so why not this one too?
Set:
o `[]= - Clearly inspired by the multisets. I think a prettier API is to use remove_at/remove/_m_delete to remove elements for symmetry with the other variants, and that the assign-zero-to-remove method should be delegated to a compatibility measure.
Map:
o remove() - This is also the _m_delete lfun. Inconsistent name wrt remove_at() in Sequence.
Iterators:
o Good improvements with has_next and has_previous.
o `== - The criteria for equality strikes me as very arbitrary. What is the reason? A function to be able to get the collection the iterator uses seems more useful and makes the code easier to understand: iter1->get_collection() == iter2->get_collection().
o An API to be able to insert and delete wrt to the iterator position.
o An API to be able to do various kinds of searches wrt to the iterator position, somewhat like index_of/_search.
Also, I consider the following properties fairly important, and it's not obvious how they fit into this hierarchy:
o Unique vs multiple indices. o Ordering: unordered/using ordering function (or builtin order)/ stable order without ordering function. Perhaps also partial orders.
*) An integer correspondence to Math.inf would be nice. I'll see if I can get around adding that when the cvs fork is done.
/ Martin Stjernholm, Roxen IS
Previous text:
2002-10-28 15:45: Subject: Proposal for new ADT interface
I'm currently working on a thesis that should bring Pike some new ADTs. I'm currently working on a common interface for the new ADTs and this is a draft of how it might be. The goal of the interface is resemble as much as possible of the current types array, multiset and mapping. Another goal with the interface is to keep it as minimal as possible. No unnecessary methods. The reason is simple, lack of time. The time I've got on hand is far to little to do a trough out analysis of possible fields of application and methods in the ADTs. The solution is to keep the interface incomplete by sufficient to work with a few algorithms and let other add new methods when they develop new algorithms. The base of the interface is four different concepts, one base concept and three sub-concepts that inherit all the methods from the base concept.
Collection |---- Sequence |---- Set |---- Map
Additional sub-concepts to the sub-concepts could be added later, for example sorted set. If an ADT implements all the methods in a concept and its super concepts it is an implementation of the concept.
Collection Collection is the base concept of all the ADTs. The reason it is called collection is that all the ADTs in the library is in some way a collection of items. iterator _get_iterator() Return an iterator for this ADT positioned at the first item in the ADT.
int _sizeof() Return the current size of the ADT.
array _values() Return an array with the values in the ADT.
int is_empty() Return 0 if the ADT do has elements.
int max_size() Return the maximal size of the ADT if there is any, otherwise -1.
void clear() Clear the ADT.
void add_all(collection col) Add all the items in the collection col to this collection.
Sequence A sequence is a linear container that allows index-based access and automatically expands to accommodate new elements. A vector is a good example.
void insert(int ind, mixed item) Insert an item in the sequence at the index ind. The size of the sequence is increased by one and the items at index and above have their indices increases by one. Throw an error if ind is out of range.
int remove(mixed item) Remove the item item from the sequence. Return the number of removed items.
int index_of(int ind, mixed item) Return the index of the first item in the sequence with the index ind or higher that mach the item item. -1 if no object was found. Throw an error if ind is out of range.
void add(mixed item) Add the object item as fast as possible to the sequence.
Int remove_at(int ind) Remove the item at the index ind. The size of the sequence is decreased by one and the items above ind have their indices decreases by one. Return 0 if the removal failed. Throw an error if ind is out of range.
iterator get_iterator(int ind) Get an iterator for this ADT positioned at the index ind. Throw an error if ind is out of range.
mixed '[](int ind) Return the item at index ind in the sequence. Throw an error if ind is out of range.
Mixed '[]=(int ind, mixed item) Set the index ind to be item. Throw an error if ind is out of range
Set Set is a collection containing any number of items but only one of each item.
iterator get_iterator(mixed item) Get an iterator for this ADT positioned at the item or null if the item could not be found.
int '[](set set, mixed item) Return 1 if the item could be found in the set, 0 if it couldn't
int '[]=(mixed item, int exist) Add the item to the set if exist is true remove the item if exist is false.
Map A map is a data structure that allows you to associate a key with one or more values and then quickly retrieve those values using the key.
array _indices() Return an array with the indices in the map.
iterator get_iterator(mixed key) Get an iterator for this ADT positioned at the key or null if the key could not be found.
int remove(mixed key) Remove the pair with the key key. Return 0 if the key could not be removed.
mixed '[](mixed key) Return the item associated with the key if it could be found, 0 if it couldn't
mixed '[]=(mixed key, mixed item) If there is an item with the key key then the associated item will be overwritten with the supplied item. If there is no key in the map a new key-item pair will be added.
Iterator changes I've also analysed the current iterators and how well they would work with generic algorithms and I found that it is only one thing that really lacks, the possibility to walk backwards. Actually the possibility to walk backwards is already there, it just doesn't have it's own method.
iterator '-=(int n) Move the iterator n steps backwards
int has_previous() Return true if the iterator can move one step backwards.
int has_next() Return true if the iterator can move one step forwards. It have the same use as '! but since there is a has_previous I believe it should be a has_next to.
int '==(iterator iter) Return true if the two iterators is pointing to the same object.
Well this is briefly what I have in mind. If you have any questions or comments you're more then welcome.
/ Peta, jo det är jag
Great comments, agree on most, I've just have a few comments and questions.
o insert() - For the new multisets I'd like to add a global m_insert(), corresponding to the m_delete() that already exists. An lfun like this one would then be used, so this should be that lfun.
If the correspondence to m_delete should be kept, that lfun should then be called _m_insert. The problem with that is mainly that the "m" part is ugly and out of place in this ADT generalization.
As you say, it feels a bit weird to use m_delete to remove from a set, have to think about that.
o add() - Why here and not in Collection? Collection got add_all, so why not this one too?
The reason why I did not place add in the collection concept is that all the three sub concepts has a different syntax for add. They where later removed from Map and Set since they where redundant. The reason I kept them in Sequence was that often you just want to add a item to a sequence, no matter where and then add do it as fast as possible, with insert you have to know the under laying structure.
o `== - The criteria for equality strikes me as very arbitrary. What is the reason? A function to be able to get the collection the iterator uses seems more useful and makes the code easier to understand: iter1->get_collection() == iter2->get_collection().
Sorry me not being clear enough on that. The text should rather be. 'Return true if the two iterators is pointing to the same item in the same collection.' Since you can't rust that a collection has a valid indexing and you can't always compare the values of the iterators since a collection can contain the same item twice.
o An API to be able to insert and delete wrt to the iterator position.
Yes, at least a set_value() is needed, insert and remove is useful to, have to think about this one.
· Unique vs multiple indices.
I guess you mean set/multiset. The multiset concept is a possible sub-concept to the set concept
Ordering: unordered/using ordering function (or builtin order)/ stable order without ordering function. Perhaps also partial orders.
I believe that sorted set is a sub-concept of the set concept, same with sorted map, I add these to the API. Sequences on the other hand does not need a sorted concept but is best kept sorted with the help of a sort algorithm and a binary search algorithm, as I see it.
/ Peta, jo det är jag
Previous text:
2002-10-28 20:17: Subject: Proposal for new ADT interface
It's a good goal to resemble the builtin types. I've therefore pointed out several cases where you don't use the standard callbacks (usually called "lfuns") that exists already.
Collection:
o You might want to consider handling collections that aren't bounded, e.g. they have an iterator that enumerates elements from some outside source whose size isn't known in advance, or that is unlimited. I suggest allowing the return value -1 from _sizeof() to flag this, and perhaps even have another return value to flag whether the collection is truly unlimited or only of (currently) unknown size.(*)
Handling collections with unknown size probably has consequences elsewhere too, but it is a very useful application, so it would be nice to include it in the design from the start.
o add_all() - A cast, `+ or `+= are more natural ways to perform this in Pike.
o An _equal() lfun would be useful to be able to do recursive comparison with equal().
o In the same vein, perhaps _encode() and _decode() should be mentioned too? (They're used for serialization with encode_value and decode_value). Maybe also _sprintf()?
Sequence:
o insert() - For the new multisets I'd like to add a global m_insert(), corresponding to the m_delete() that already exists. An lfun like this one would then be used, so this should be that lfun.
If the correspondence to m_delete should be kept, that lfun should then be called _m_insert. The problem with that is mainly that the "m" part is ugly and out of place in this ADT generalization.
o index_of() - This functionality should be accessed through the global search() function for consistence. It currently have no lfun to handle objects, but a _search should clearly be added for that, imho.
o remove_at() - This is the _m_delete lfun.
o add() - Why here and not in Collection? Collection got add_all, so why not this one too?
Set:
o `[]= - Clearly inspired by the multisets. I think a prettier API is to use remove_at/remove/_m_delete to remove elements for symmetry with the other variants, and that the assign-zero-to-remove method should be delegated to a compatibility measure.
Map:
o remove() - This is also the _m_delete lfun. Inconsistent name wrt remove_at() in Sequence.
Iterators:
o Good improvements with has_next and has_previous.
o `== - The criteria for equality strikes me as very arbitrary. What is the reason? A function to be able to get the collection the iterator uses seems more useful and makes the code easier to understand: iter1->get_collection() == iter2->get_collection().
o An API to be able to insert and delete wrt to the iterator position.
o An API to be able to do various kinds of searches wrt to the iterator position, somewhat like index_of/_search.
Also, I consider the following properties fairly important, and it's not obvious how they fit into this hierarchy:
o Unique vs multiple indices. o Ordering: unordered/using ordering function (or builtin order)/ stable order without ordering function. Perhaps also partial orders.
*) An integer correspondence to Math.inf would be nice. I'll see if I can get around adding that when the cvs fork is done.
/ Martin Stjernholm, Roxen IS
As you say, it feels a bit weird to use m_delete to remove from a set, have to think about that.
We could perhaps rename the lfun to _delete and add an _insert. Compatibility measures can always be devised.
The reason why I did not place add in the collection concept is that all the three sub concepts has a different syntax for add.
Ok, but note that you will have essentially the same problems in add_all wrt differences in index/value sets. For instance, what do you plan to do if a Set is added to a Sequence? Will the indices then be converted to values? I'm not at all certain that such dwim would be good.
The add_all() method also brings in a number of other issues. How are duplicates handled? How is the order defined, if there is any? Why handle that specific merge method and not the others (intersection, set difference etc)? If you should include this in your design (which I'd appreciate) then I think you should handle the generic merge case. There is of course a lot of already implemented behavior to take into account regarding the different merge operations (`+, `-, `|, `& and `^).
'Return true if the two iterators is pointing to the same item in the same collection.' Since you can't rust that a collection has a valid indexing and you can't always compare the values of the iterators since a collection can contain the same item twice.
Ok. Good point. Come to think about it, iterators in ordered collections ought to have `< and `> too.
· Unique vs multiple indices.
I guess you mean set/multiset. The multiset concept is a possible sub-concept to the set concept
No, that's just a special case. The same applies to Map too. It's actually not very clear what principal difference you use to tell a Map from a Set. It's probably that Sets lacks values, but other reasonable divisions are whether there can/can't be multiple indices, or whether there's a well defined order.
I've come to the conclusion that among those differences the order is the most fundamental, and that is also what I plan to make evident wrt mappings when I add the high level interface to the new multiset implementation.
Whether there can be values attached or not is just a minor difference that only affects what you can do with a specific element once it's allocated. It's easy to e.g. define a behavior for _values() when there are no values in the collection.
The difference between unique and duplicate indices is somewhat more profound, but I still think it's secondary. Handling of duplicate indices only implies that API must be extended a bit: The operations that locates a single element using a key must be extended to use some method of choice if several elements matches(*), the iterators must be able to keep their position among identical indices, and there should preferably be some operations to handle (the values of) all the matching elements as a group.
An alternative more concrete view of this issue is to consider the current mapping implementation, which is a hash table: It would be comparatively easy to extend it both to make the values optional and to allow duplicate indices, but imposing a well defined order would practically imply a rewrite from scratch, and it's not clear whether a hash table would still be useful then.
*) The new low level multiset implementation defines this to access the last element according to the order. If the mapping implementation would be extended to handle duplicate indices, it would reasonably define it to be random (i.e. take whatever element happens to be closest to the top in the hash bucket).
/ Martin Stjernholm, Roxen IS
Previous text:
2002-10-29 11:35: Subject: Proposal for new ADT interface
Great comments, agree on most, I've just have a few comments and questions.
o insert() - For the new multisets I'd like to add a global m_insert(), corresponding to the m_delete() that already exists. An lfun like this one would then be used, so this should be that lfun.
If the correspondence to m_delete should be kept, that lfun should then be called _m_insert. The problem with that is mainly that the "m" part is ugly and out of place in this ADT generalization.
As you say, it feels a bit weird to use m_delete to remove from a set, have to think about that.
o add() - Why here and not in Collection? Collection got add_all, so why not this one too?
The reason why I did not place add in the collection concept is that all the three sub concepts has a different syntax for add. They where later removed from Map and Set since they where redundant. The reason I kept them in Sequence was that often you just want to add a item to a sequence, no matter where and then add do it as fast as possible, with insert you have to know the under laying structure.
o `== - The criteria for equality strikes me as very arbitrary. What is the reason? A function to be able to get the collection the iterator uses seems more useful and makes the code easier to understand: iter1->get_collection() == iter2->get_collection().
Sorry me not being clear enough on that. The text should rather be. 'Return true if the two iterators is pointing to the same item in the same collection.' Since you can't rust that a collection has a valid indexing and you can't always compare the values of the iterators since a collection can contain the same item twice.
o An API to be able to insert and delete wrt to the iterator position.
Yes, at least a set_value() is needed, insert and remove is useful to, have to think about this one.
· Unique vs multiple indices.
I guess you mean set/multiset. The multiset concept is a possible sub-concept to the set concept
Ordering: unordered/using ordering function (or builtin order)/ stable order without ordering function. Perhaps also partial orders.
I believe that sorted set is a sub-concept of the set concept, same with sorted map, I add these to the API. Sequences on the other hand does not need a sorted concept but is best kept sorted with the help of a sort algorithm and a binary search algorithm, as I see it.
/ Peta, jo det är jag
Realised something similar when I was testing different algorithms yesterday. A lot of them were only useful on ordered collections. I shall try out a structure like this.
Collection |-----------> Ordered Collection | | | |--------> Sequence | |--------> Ordered Set <------| | |--------> Ordered Map <--| | |----> Map --------------------------------------| | |----> Set ------------------------------------------|
And two different iterators, the basic iterator and the ordered iterator where you are guaranteed to get the same order each time and index(), '<' and '>' make sense.
/ Peta, jo det är jag
Previous text:
2002-10-29 20:46: Subject: Proposal for new ADT interface
As you say, it feels a bit weird to use m_delete to remove from a set, have to think about that.
We could perhaps rename the lfun to _delete and add an _insert. Compatibility measures can always be devised.
The reason why I did not place add in the collection concept is that all the three sub concepts has a different syntax for add.
Ok, but note that you will have essentially the same problems in add_all wrt differences in index/value sets. For instance, what do you plan to do if a Set is added to a Sequence? Will the indices then be converted to values? I'm not at all certain that such dwim would be good.
The add_all() method also brings in a number of other issues. How are duplicates handled? How is the order defined, if there is any? Why handle that specific merge method and not the others (intersection, set difference etc)? If you should include this in your design (which I'd appreciate) then I think you should handle the generic merge case. There is of course a lot of already implemented behavior to take into account regarding the different merge operations (`+, `-, `|, `& and `^).
'Return true if the two iterators is pointing to the same item in the same collection.' Since you can't rust that a collection has a valid indexing and you can't always compare the values of the iterators since a collection can contain the same item twice.
Ok. Good point. Come to think about it, iterators in ordered collections ought to have `< and `> too.
· Unique vs multiple indices.
I guess you mean set/multiset. The multiset concept is a possible sub-concept to the set concept
No, that's just a special case. The same applies to Map too. It's actually not very clear what principal difference you use to tell a Map from a Set. It's probably that Sets lacks values, but other reasonable divisions are whether there can/can't be multiple indices, or whether there's a well defined order.
I've come to the conclusion that among those differences the order is the most fundamental, and that is also what I plan to make evident wrt mappings when I add the high level interface to the new multiset implementation.
Whether there can be values attached or not is just a minor difference that only affects what you can do with a specific element once it's allocated. It's easy to e.g. define a behavior for _values() when there are no values in the collection.
The difference between unique and duplicate indices is somewhat more profound, but I still think it's secondary. Handling of duplicate indices only implies that API must be extended a bit: The operations that locates a single element using a key must be extended to use some method of choice if several elements matches(*), the iterators must be able to keep their position among identical indices, and there should preferably be some operations to handle (the values of) all the matching elements as a group.
An alternative more concrete view of this issue is to consider the current mapping implementation, which is a hash table: It would be comparatively easy to extend it both to make the values optional and to allow duplicate indices, but imposing a well defined order would practically imply a rewrite from scratch, and it's not clear whether a hash table would still be useful then.
*) The new low level multiset implementation defines this to access the last element according to the order. If the mapping implementation would be extended to handle duplicate indices, it would reasonably define it to be random (i.e. take whatever element happens to be closest to the top in the hash bucket).
/ Martin Stjernholm, Roxen IS
Looks good, but I'd still like to see unique/multiple indices in that inherit structure, at least in favor of the Map/Set division (which I assume means with/without values).
Why must there be an order for index() to make sense?
/ Martin Stjernholm, Roxen IS
Previous text:
2002-10-31 11:46: Subject: Proposal for new ADT interface
Realised something similar when I was testing different algorithms yesterday. A lot of them were only useful on ordered collections. I shall try out a structure like this.
Collection |-----------> Ordered Collection | | | |--------> Sequence | |--------> Ordered Set <------| | |--------> Ordered Map <--| | |----> Map --------------------------------------| | |----> Set ------------------------------------------|
And two different iterators, the basic iterator and the ordered iterator where you are guaranteed to get the same order each time and index(), '<' and '>' make sense.
/ Peta, jo det är jag
Looks good, but I'd still like to see unique/multiple indices in that inherit structure, at least in favor of the Map/Set division (which I assume means with/without values).
Why must there be an order for index() to make sense?
/ Martin Stjernholm, Roxen IS
Previous text:
2002-10-31 11:46: Subject: Proposal for new ADT interface
Realised something similar when I was testing different algorithms yesterday. A lot of them were only useful on ordered collections. I shall try out a structure like this.
Collection |-----------> Ordered Collection | | | |--------> Sequence | |--------> Ordered Set <------| | |--------> Ordered Map <--| | |----> Map --------------------------------------| | |----> Set ------------------------------------------|
And two different iterators, the basic iterator and the ordered iterator where you are guaranteed to get the same order each time and index(), '<' and '>' make sense.
/ Peta, jo det är jag
Looks good, but I'd still like to see unique/multiple indices in that inherit structure, at least in favor of the Map/Set division (which I assume means with/without values).
Why must there be an order for index() to make sense?
/ Martin Stjernholm, Roxen IS
Previous text:
2002-10-31 11:46: Subject: Proposal for new ADT interface
Realised something similar when I was testing different algorithms yesterday. A lot of them were only useful on ordered collections. I shall try out a structure like this.
Collection |-----------> Ordered Collection | | | |--------> Sequence | |--------> Ordered Set <------| | |--------> Ordered Map <--| | |----> Map --------------------------------------| | |----> Set ------------------------------------------|
And two different iterators, the basic iterator and the ordered iterator where you are guaranteed to get the same order each time and index(), '<' and '>' make sense.
/ Peta, jo det är jag
Realised something similar when I was testing different algorithms yesterday. A lot of them were only useful on ordered collections. I shall try out a structure like this.
Collection |-----------> Ordered Collection | | | |--------> Sequence | |--------> Ordered Set <------| | |--------> Ordered Map <--| | |----> Map --------------------------------------| | |----> Set ------------------------------------------|
And two different iterators, the basic iterator and the ordered iterator where you are guaranteed to get the same order each time and index(), '<' and '>' make sense.
/ Peta, jo det är jag
Previous text:
2002-10-29 20:46: Subject: Proposal for new ADT interface
As you say, it feels a bit weird to use m_delete to remove from a set, have to think about that.
We could perhaps rename the lfun to _delete and add an _insert. Compatibility measures can always be devised.
The reason why I did not place add in the collection concept is that all the three sub concepts has a different syntax for add.
Ok, but note that you will have essentially the same problems in add_all wrt differences in index/value sets. For instance, what do you plan to do if a Set is added to a Sequence? Will the indices then be converted to values? I'm not at all certain that such dwim would be good.
The add_all() method also brings in a number of other issues. How are duplicates handled? How is the order defined, if there is any? Why handle that specific merge method and not the others (intersection, set difference etc)? If you should include this in your design (which I'd appreciate) then I think you should handle the generic merge case. There is of course a lot of already implemented behavior to take into account regarding the different merge operations (`+, `-, `|, `& and `^).
'Return true if the two iterators is pointing to the same item in the same collection.' Since you can't rust that a collection has a valid indexing and you can't always compare the values of the iterators since a collection can contain the same item twice.
Ok. Good point. Come to think about it, iterators in ordered collections ought to have `< and `> too.
· Unique vs multiple indices.
I guess you mean set/multiset. The multiset concept is a possible sub-concept to the set concept
No, that's just a special case. The same applies to Map too. It's actually not very clear what principal difference you use to tell a Map from a Set. It's probably that Sets lacks values, but other reasonable divisions are whether there can/can't be multiple indices, or whether there's a well defined order.
I've come to the conclusion that among those differences the order is the most fundamental, and that is also what I plan to make evident wrt mappings when I add the high level interface to the new multiset implementation.
Whether there can be values attached or not is just a minor difference that only affects what you can do with a specific element once it's allocated. It's easy to e.g. define a behavior for _values() when there are no values in the collection.
The difference between unique and duplicate indices is somewhat more profound, but I still think it's secondary. Handling of duplicate indices only implies that API must be extended a bit: The operations that locates a single element using a key must be extended to use some method of choice if several elements matches(*), the iterators must be able to keep their position among identical indices, and there should preferably be some operations to handle (the values of) all the matching elements as a group.
An alternative more concrete view of this issue is to consider the current mapping implementation, which is a hash table: It would be comparatively easy to extend it both to make the values optional and to allow duplicate indices, but imposing a well defined order would practically imply a rewrite from scratch, and it's not clear whether a hash table would still be useful then.
*) The new low level multiset implementation defines this to access the last element according to the order. If the mapping implementation would be extended to handle duplicate indices, it would reasonably define it to be random (i.e. take whatever element happens to be closest to the top in the hash bucket).
/ Martin Stjernholm, Roxen IS
Realised something similar when I was testing different algorithms yesterday. A lot of them were only useful on ordered collections. I shall try out a structure like this.
Collection |-----------> Ordered Collection | | | |--------> Sequence | |--------> Ordered Set <------| | |--------> Ordered Map <--| | |----> Map --------------------------------------| | |----> Set ------------------------------------------|
And two different iterators, the basic iterator and the ordered iterator where you are guaranteed to get the same order each time and index(), '<' and '>' make sense.
/ Peta, jo det är jag
Previous text:
2002-10-29 20:46: Subject: Proposal for new ADT interface
As you say, it feels a bit weird to use m_delete to remove from a set, have to think about that.
We could perhaps rename the lfun to _delete and add an _insert. Compatibility measures can always be devised.
The reason why I did not place add in the collection concept is that all the three sub concepts has a different syntax for add.
Ok, but note that you will have essentially the same problems in add_all wrt differences in index/value sets. For instance, what do you plan to do if a Set is added to a Sequence? Will the indices then be converted to values? I'm not at all certain that such dwim would be good.
The add_all() method also brings in a number of other issues. How are duplicates handled? How is the order defined, if there is any? Why handle that specific merge method and not the others (intersection, set difference etc)? If you should include this in your design (which I'd appreciate) then I think you should handle the generic merge case. There is of course a lot of already implemented behavior to take into account regarding the different merge operations (`+, `-, `|, `& and `^).
'Return true if the two iterators is pointing to the same item in the same collection.' Since you can't rust that a collection has a valid indexing and you can't always compare the values of the iterators since a collection can contain the same item twice.
Ok. Good point. Come to think about it, iterators in ordered collections ought to have `< and `> too.
· Unique vs multiple indices.
I guess you mean set/multiset. The multiset concept is a possible sub-concept to the set concept
No, that's just a special case. The same applies to Map too. It's actually not very clear what principal difference you use to tell a Map from a Set. It's probably that Sets lacks values, but other reasonable divisions are whether there can/can't be multiple indices, or whether there's a well defined order.
I've come to the conclusion that among those differences the order is the most fundamental, and that is also what I plan to make evident wrt mappings when I add the high level interface to the new multiset implementation.
Whether there can be values attached or not is just a minor difference that only affects what you can do with a specific element once it's allocated. It's easy to e.g. define a behavior for _values() when there are no values in the collection.
The difference between unique and duplicate indices is somewhat more profound, but I still think it's secondary. Handling of duplicate indices only implies that API must be extended a bit: The operations that locates a single element using a key must be extended to use some method of choice if several elements matches(*), the iterators must be able to keep their position among identical indices, and there should preferably be some operations to handle (the values of) all the matching elements as a group.
An alternative more concrete view of this issue is to consider the current mapping implementation, which is a hash table: It would be comparatively easy to extend it both to make the values optional and to allow duplicate indices, but imposing a well defined order would practically imply a rewrite from scratch, and it's not clear whether a hash table would still be useful then.
*) The new low level multiset implementation defines this to access the last element according to the order. If the mapping implementation would be extended to handle duplicate indices, it would reasonably define it to be random (i.e. take whatever element happens to be closest to the top in the hash bucket).
/ Martin Stjernholm, Roxen IS
pike-devel@lists.lysator.liu.se