Re: string_to_utf8() behavior on non-wide strings

4 Nov 2004


      On Thu, Nov 04, 2004 at 07:23:26PM +0100, Peter J. Holzer wrote:
...
Pike strings are defined as strings of 32-bit values.
Could you please provide the source of this information? AFAIK, Pike
  strings may be hold characters with 8, 16 or 32-bit in length, according
  to documentation.
...
was, you have to encode it in some way. Using UTF-8 (which is standardized
and in wide use) seems to be better than inventing the umpteenth
proprietary encoding.
Again... :( Is my English so bad or something else is wrong? Currently,
  SQLite module will apply string_to_utf8() function to _any_ string
  which is passed to big_query(), except when bindings are in use and
  string is 8-bit wide.
See what happens:
1) I supply big_query() with UTF8 encoded string.
  2) SQLite module converts it (again) to UTF8, which scrabmles Unicode.
     Note that this conversion is implicit and cannot be turned off, unless
     I (manually) apply utf8_to_string() before and pass it's result to
     big_query().
  3) Any external application which expects UTF8 encoded Unicode characters
     in sqlite database will get it wrong. sqlite by itself (with caseless
     comparision) will be unable to handle it right as well.
Don't use see the problem?
...
I haven't checked if there is really no other DB module which does this,
I did. Only SQLite does this.
...
module didn't think about what would happen if you tried to store a wide
string in a database. Or they did think about it and decided that that
should be decided by the application.
It should be decided by the application, not SQL module, _always_.
  That's what I am trying to tell here.
...
I don't understand this. If they are against your proposal why shouldn't
they argue their point? Why "shut up, we don't like it" better than "we
don't like it, because ..."?
Because there is no "because". I tell that implicit conversion breaks
  things (see above), they tell me that "it won't hurt, just decode it twice".
Regards,
/Al

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: string_to_utf8() behavior on non-wide strings