On Wed, Nov 03, 2004 at 10:40:05AM +0100, Mirar @ Pike developers forum wrote:
But that sounds like a problem with sqllite (was it?), not string_to_utf8.
Not really, it is just SQLite module that uses string_to_utf8() implicitly. Just to note - MySql also supports UTF-8, but there is no implicit conversion done, strings are passed as is.
And the (current) behavior of string_to_utf8() still may cause problems - once it will be used somewhere else. AFAIK, UTF-8 encoding was not intended to encode 8-bit wide characters (this simply makes no sense), so when argument is 8-bit wide string, nothing should be done (well, at most - check that input is valid UTF-8 stream) - this seems logical, or?
It a particular library throws around the conversion, you just have to document what exactly it does, if you can't generalize it...
The problem is (in particular case of SQLite module) that with implicit conversion in place it will not be possible to use encodings other than UTF-8 (while library allows it).
What is worse, it is required to use 16- or 32- bit wide strings to store UTF-8 string into database - i.e., any external UTF-8 string (user input, for instance) which should be passed to sqlite must be converted to 16- or 32-bit Pike string, and only then passed to SQLite functions (where it again will be converted to UTF-8) - otherwise conversion will scramble it.
... just grepped through sources - there are very few places where utf8 conversions are performed - SQLite, PCRE, some xml stuff and (naturally) charset handling modules.
I am not against conversion, but I strongly believe that any conversion should be controlled by the user (application). Implicit conversion (unless it is unobtrusive - which is not the case) is Very Bad Thing (tm)...
NB: This all is not only a theory - I've a real application which cannot use SQLite "as is", i.e. with this conversion. I can use MySql or Informix without any problems, though - just wanted to get rid of it...
Regards, /Al