Re: string_to_utf8() behavior on non-wide strings

5 Nov 2004


      On Fri, Nov 05, 2004 at 12:35:39PM +0100, Henrik Grubbström (Lysator) @ Pike (-) developers forum wrote:
...
http://www.sqlite.org/capi3ref.html and it seems it just about
everywhere expects data to be entered/extracted as either UTF-8 or
It doesn't check the validity of encoding nor makes any conversions
  internally.
...
UTF-16, and since Nilsson decided to use the UTF-8 variants of the
API I also agree with the "implicit" UTF-8 conversions (the only
Again, this prevents direct usage of UTF-8 encoded strings, because then
  those will be encoded twice. Yes, we already discussed that "one should
  not use UTF-8 when working with SQLite module", but I strongly disagree
  with this policy - already explained why, many times.
...
Note that BLOB fields naturally should not be converted.
Current SQLite module implementation assumes that field is a BLOB if (and
  only if) it is 8-bit wide string, passed to statement using bindings. This
  way, UTF-8 encoded string may not be stored as text-type string using
  bindings.
If opposition is so strong - OK, I'll leave Nillson's module (in CVS) as
  is and use modified version, which will be simply wrapper, not any kind of
  "intellectual decision machine knowing what to do better than the user"
  (sorry, but currently it is - any uncontrollable implicity will be like
  this).
After all, it seems that I am only real user of sqlite in Pike, at least
  only one who intends to use it in production mode, and current
  implementation is too restrictive because of this implicity.
It is one thing to implement something just as a "proof of concept", or
  "to declare that it exists", but completely another to actually use it...
Just to summarize why current SQLite is restrictive:
1) Already prepared UTF-8 strings cannot be used directly;
  2) Anything but UTF-8 cannot be used while sqlite allows this;
  3) Enforced conversion add additional overhead - it doesn't matter
     how small it is, but it is there, while can be avoided.
While (2) and (3) are not very important (at this stage), (1) is
  _extremely_ important (in my case, at least). No, I don't want
  to utf8_to_string() first, before passing them to SQLite, just
  because of this implicit conversion. There is alternative, though -
  don't make any conversion if string is 8-bit wide (my initial
  proposal) - this won't hurt anybody, and those who will (because
  nobody does right now) use 16- or 32-bit strings will see no
  difference.
Regards,
/Al

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: string_to_utf8() behavior on non-wide strings