Well, Locale.Charset.decoder does at least throw when fed an encoding name it can't recognize:
Locale.Charset.decoder("foo");
Unknown character encoding foo /usr/local/pike/7.4.13/lib/modules/_Charset.pmod:214: Locale.Charset->decoder("foo")
and that certainly is a Good Thing. The current behavior on "utf-8" unfortunately rules out using the decoder in an XML parser that wants to make a best effort to comply with the spec (even if full compliance isn't a realistic goal, in view of the bloated overengineered specification, *sigh*). That of course can be worked around by special-casing "utf-8" to use utf8_to_string, which seems to be more strict. But who knows what traps lurk in the handling of other encodings...
Wishful thinking: perhaps someday the Charset module might support a "strict mode", where it refuses to swallow sequences that are invalid in the given encoding?
/ rjb
Previous text:
2003-03-06 10:28: Subject: decoder for utf-8
Locale.Charset.decoder never throws errors (except for internal error conditions). Instead, it makes a best effort intepretation of the data. In this case, you have something that is almost a valid two-byte encoding of '?' (\xc0\xbf), but the continuation byte has been increased by one, making it an illegal sequence. Well, if it _had_ been legal to increase the continuation byte by one, it would of course have meant that the character code should be increased by one (giving '@') since this is the last continuation byte, so that's how it is interpreted.
/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)