I'd be interested to hear why the charset module is treating imperfect input forgivingly. I can easily see cases where that is very useful, but it does not strike me as a typical Comstedt design choice, when there are rigid rules or standards on offer. Best practice recommended by RFC 1345 (which I have hardly read at all)?
/ Johan Sundström (folkskådare)
Previous text:
2003-03-06 10:28: Subject: decoder for utf-8
Locale.Charset.decoder never throws errors (except for internal error conditions). Instead, it makes a best effort intepretation of the data. In this case, you have something that is almost a valid two-byte encoding of '?' (\xc0\xbf), but the continuation byte has been increased by one, making it an illegal sequence. Well, if it _had_ been legal to increase the continuation byte by one, it would of course have meant that the character code should be increased by one (giving '@') since this is the last continuation byte, so that's how it is interpreted.
/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)