It doesn't check the validity of encoding nor makes any conversions internally.
Afaics it does, both when necessary in the communication with clients, and when collation etc calls for it. It's clear as day that it got unicode written all over it, and just because it strictly is possible to ignore that doesn't retract from this.
Why would anyone want to store invalid UTF strings in TEXT fields when BLOB fields are available? Besides proving some kind of point to do it just because it can be done?
If opposition is so strong - OK, I'll leave Nillson's module (in CVS) as is and use modified version,
You seem to ignore that as the discussion has progressed, noone has opposed adding a flag to turn it off. Isn't that enough for you? Or do you just continue this kind of sulky the-world-against-me attitude for the sake of it?
- Already prepared UTF-8 strings cannot be used directly;
This point can be reduced to (3) by just decoding the strings before entry. In other words, it's not a matter of versatility but one of performance.
- Anything but UTF-8 cannot be used while sqlite allows this;
I wouldn't say it's allowed just because it doesn't check for invalid strings. Everywhere in the docs I've looked says it's UTF8 or UTF16, period. Is there any guarantee that they won't add a validity checker at some point?
- Enforced conversion add additional overhead - it doesn't matter how small it is, but it is there, while can be avoided.
Valid point, although it still would be nice to see the kind of overhead the extra overhead incur.
There is alternative, though - don't make any conversion if string is 8-bit wide (my initial proposal) - this won't hurt anybody, and those who will (because nobody does right now) use 16- or 32-bit strings will see no difference.
Oh my will this hurt! This is definitely the one thing I absolutely and utterly oppose. How do you know if the string is to be UTF8/16 decoded when you get it back? Using some kind of dwim by trying to decode it and just pass it through if that fails? Then there's always the possibility that it'll decode eight bit raw strings that just happen to not be invalid UTF-8. What if you want to use the sqlite collation functions etc on those strings? They sure as hell won't work correctly on unencoded eight bit chars.