since this is moving into speculation now, i am moving this topic to the devel forum...
On Tue, Sep 08, 2009 at 06:44:16PM +0200, Martin Stjernholm wrote:
what is the reason for this difference?
Historical, I think. Once upon a time multisets were implemented as sorted arrays.
hmm? i don't see how that explains why multisets are binary trees. the explanation you give below makes much more sense.
But there would be little use to have hash tables in two forms. If you want a hash table, use a mapping.
i would like an optimized multiset.
at this point the only difference between a mapping and a multiset is the fact that i can write it without values: (< "key", "key" >) vs: ([ "key":value ]), and that iteration over a multiset includes keys not just once.
i always saw multisets as syntactic sugar for a special form of mappings. were it not for the future plans...
Rather, the intention is at some point to extend multisets to be able to take values, just like mappings, and to take a custom sort function, so that their binary tree structure is fully usable from the pike level. Most low-level support for that is already in place.
sounds great. binary trees are a useful thing to have, although i wonder how they would relate to current use of multisets. aren't they a whole different thing alltogether?
and what about the necessary syntax change? we would get (< key, ...>) and (< key:value, ...>), at which point we might as well allow: ([ key,... ]) (ie: a hashtable based set implementation, aka syntax sugar for mapping(mixed:int(0..1))
and, hmm here is some wood for the fire of the discussion for tuple syntax. how about: tuple(mixed:mixed:mixed) foo = a:b:c;
then we can write arrays of tuples like this: ({ a:b:c, ... })
and the key:value pairs of mappings could be treated as tuples as well, same with multisets which would be generalized to: multiset(tuple(mixed:mixed)) leading naturally to multisets with values and mappings without values.
at least on the syntax level. in the implementation the problem would be that the first element of a tuple would have to be the key, and not the whole tuple. and nested tuples would not be possible with this syntax either.
greetings, martin.
/.../ Once upon a time multisets were implemented as sorted arrays.
hmm? i don't see how that explains why multisets are binary trees. the explanation you give below makes much more sense.
To make lookups reasonably efficient, the arrays inside multisets were kept sorted, so that binary search could be used to find elements. I.e. they were essentially binary trees but in an inefficient storage format.
sounds great. binary trees are a useful thing to have, although i wonder how they would relate to current use of multisets. aren't they a whole different thing alltogether?
Not really. You just need to add values to them by extending the syntax to (<key:value, ...>) and types to multiset(string:int). That's not a big issue.
I'd preferably make the non-value form just an alias for the one with values, i.e. multiset(string) would be the same as multiset(string:int(1..1)), and (<1,2>) would be the same as (<1:1,2:1>).
But unfortunately that introduces a compat problem with the syntax to remove elements. Elements are removed from a multiset using by assigning a zero:
my_multiset[x] = 0;
corresponds to m_delete(my_mapping, x). In a multiset with values, that should assign the value zero to the key x, just like for mappings. To keep compatibility with this, it's necessary to make multisets with values inherently different from those without values ("old-style" multisets, so to say). The low-level implementation supports that too, but it complicates the type checker and the runtime type system.
For instance, would (<>) create a multiset with or without values? For compatibility, it would create one without values. Then how would one create a multiset with values? A function could be added for it of course, but it wouldn't be very pretty.
/.../ at which point we might as well allow: ([ key,... ]) (ie: a hashtable based set implementation, aka syntax sugar for mapping(mixed:int(0..1))
Yes, that's the logical extension to it (although it would be int(1..1) for the value type rather than int(0..1)). Here too the value/valueless property of multisets could complicate matters: If there are two different sorts of multisets then for symmetry there ought to be two different sorts of mappings as well - with and without values, and for a mapping without values you could delete items through
my_mapping[x] = 0;
That can be done, of course, but imo it'd complicate the mapping implementation for very little gain.
and, hmm here is some wood for the fire of the discussion for tuple syntax. how about: tuple(mixed:mixed:mixed) foo = a:b:c;
I don't think one should mix tuples with the index/value pairs, since the index is treated inherently differently from the value, both in mappings and (future) multisets.
pike-devel@lists.lysator.liu.se