On Wed, Nov 03, 2004 at 11:15:02AM +0100, Mirar @ Pike developers forum wrote:
Not at all. UTF-8 was made to encode 8-bit characters as well as 16-bit.
There is little (if any) sense to encode 8-bit values into 8-bit values, expanding string (size) on the way, don't you think so?
There is no way to distinguish an 8-bit wide string and an UTF-8-encoded string.
That's why decision about conversion should be left to application/user.
Note that the following must *always* be true:
| str == utf8_to_string(string_to_utf8(str));
... unless str is _already_ UTF-8 encoded and contains character codes
0x7f. string_to_utf8() assumes that: a) str is 16- or 32-bit wide;
b) is 7-bit only; if not - it won't work as expected/intended.
Try:
str = string_to_utf8("\x1234\x1234"); str = utf8_to_string(string_to_utf8(str)); What will be in str? "\x1234\x1234"? Wrong. Try it :) That's exactly what is happening in SQLite, BTW.
If Sqlite doesn't work, fix Sqlite or the glue to it.
It does work - as advertised. Sqlite just assumes that _any_ string is (probably) UTF-8, i.e. it makes no conversions, so it makes little sense (and even produces problems) when conversion is made implicitly.
This is not a problem to fix the glue - but before I commit the changes I would like to be sure that nobody will be hurt, and I would like to understand why it is done as it is now (so far it seems to me that it was a mistake or misunderstanding of documentation).
Regards, /Al