decoder for utf-8

6 Mar 2003

      I'd be interested to hear why the charset module is treating imperfect
input forgivingly. I can easily see cases where that is very useful,
but it does not strike me as a typical Comstedt design choice, when
there are rigid rules or standards on offer. Best practice recommended
by RFC 1345 (which I have hardly read at all)?
/ Johan Sundström (folkskådare)
Previous text:
...
2003-03-06 10:28:
Subject: decoder for utf-8

Locale.Charset.decoder never throws errors (except for internal error
conditions).  Instead, it makes a best effort intepretation of the
data.  In this case, you have something that is almost a valid
two-byte encoding of '?' (\xc0\xbf), but the continuation byte has
been increased by one, making it an illegal sequence.  Well, if it
_had_ been legal to increase the continuation byte by one, it would of
course have meant that the character code should be increased by one
(giving '@') since this is the last continuation byte, so that's how
it is interpreted.
/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

decoder for utf-8