Arne Goedeke wrote:
On Thu, 29 Mar 2012, Stephen R. van den Berg wrote:
Because I'm contemplating an optimisation which would involve making the string duplication avoidance opportunistic instead of mandatory.
I guess the point here is to skip the hashing in cases where the strings are large, come from the network/disk and/or are very unlikely to exist twice. Would it not make more sense to allow for using unhashed strings explicitly, instead? And in that case I think it would be better to have a seperate class for that. Otherwise all kinds of code would become much more complex. Think mapping lookup and similar places, where the ptr of the string is used.
That's exactly what I'm asking... How many places are there where we explicitly depend on the fact that the address can be used to define uniqueness? I'm trying to assess if it is doable to fix those.
Then, of cause, all kinds of other places need to be changed to support "new" unhashed strings, otherwise they would be quite useless, except for very special situations where some cycles can be saved.
Quite. The reasoning has come full circle, which is why I'm trying to see if it can be retrofitted to make the speedup available to all unaltered pike programs.