Does anyone know how often in the code we actually depend on the fact that the same string will be at the same address in memory?
Because I'm contemplating an optimisation which would involve making the string duplication avoidance opportunistic instead of mandatory.
I.e. something along the lines of all strings shorter than stringmin will always be optimised to a single reference, and all strings above that *might* have more than one reference, but not necessarily do (i.e. they're not fully hashed all the time, to avoid the overhead of rehashing large strings repeatedly when juggling around lots of strings).
All the places that depend on same string = same address would need to be patched. Also, to determine stringmin, some profiling of existing apps would be interesting. Is that statistic available for say Roxen, to know the distribution of string length and reference count in a running application?