Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Pointers are worse. They can contain all sorts of periodicity. An
it's different for each and every one of them). A prime modulo is very effective to put all such concerns to rest. Maybe there are other
Is Pike doing a lot of pointer hashing? Incidentally, using a prime module to hash a pointer sounds fine to me, it just means that the modulo is part of the hash function. The only thing I find "bad design" is if the hash-table lookup forces a prime modulo, even if a good hash function has already been used. We should focus on improving the hash functions in general.
One more concern is also how it behaves for the ultimately bad hash function, i.e. one that returns the same value all the time. In that case the hash table is of course O(n) on lookup and store, but it shouldn't try to grow the table all the time due to a reprobe limit. I don't know if he found a clever way to avoid that.
I don't think that was discussed during the talk. However, pondering the problem for a brief moment, I can imagine something like the following:
Keep a relative live count per thread (to avoid thread contention). Increment it when inserting a new item, decrement it when deleting an item (Tombstone). Then whenever it seems necessary to grow the hashtable, quickly sum up all the live counters and you'll have a fairly accurate population count of the total hashtable. If the ratio is below a certain threshold, do not grow the table.