In Unicode 5.1.0 there is an upper/lower case pair with a greater distance than fits in a short... Kind of a design flaw in Unicode.
You should be on the Unicode commission, if there is one. Any idea of how to position oneself there? The usual corporate $$$ bribery thing, for bulky standardization organs, or something more meritocratic, like the IETF?
I hope you did not bypass the opportunity to sign it with some impressive-and-influential-sounding title. ;-)
for the totally clueless, could you elaborate what the effect of this is, and why it is bad?
greetings, martin.
It means that the code we have for upper/lower case in Pike needs to be rewritten to handle Unicode 5.1.0. Or ignore the specific character pair in question, which is what Ken Whistler (co-author of Unicode) suggested when I asked about this.
Doubtful if it's worth the overhead in practice, even though complete correctness is not to be frowned upon. Hmm. Did strings end up being type annotated with a [lower..upper] bound without extra work penalty? And what code points are we talking about, by the way?
If it can be done inexpensively with a pre- or post-process pass (not changing ordos on the functions in the general case, which IMO ought to be considered "when this code point pair was not represented in the string" here), I'm all for it. :-)
pike-devel@lists.lysator.liu.se