Re: string_to_utf8() behavior on non-wide strings

4 Nov 2004


      On Thu, Nov 04, 2004 at 08:05:03PM +0100, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
...
Saying that a "string is UTF-16 encoded" implies that characters
outside the Basic Multilingual Plane are encoded using surrogates
(pairs of two 16-bit values).
"In computing, UTF-16 is a 16-bit Unicode Transformation Format, a
  character encoding form that provides a way to represent a series of
  abstract characters from Unicode and ISO/IEC 10646 as a series of 16-bit
  words suitable for storage or transmission via data networks."
And from the RFC: "In the UTF-16 encoding, characters are represented
  using either one or two unsigned 16-bit integers, depending on the
  character value." (http://www.ietf.org/rfc/rfc2781.txt)
As I said, UTF-16 implies 16-bit wide characters, hence, 16-bit
  wide strings in Pike, which clearly explains why I use this term.
Regards,
/Al

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: string_to_utf8() behavior on non-wide strings