Locale.Charset.decoder never throws errors (except for internal error conditions). Instead, it makes a best effort intepretation of the data. In this case, you have something that is almost a valid two-byte encoding of '?' (\xc0\xbf), but the continuation byte has been increased by one, making it an illegal sequence. Well, if it _had_ been legal to increase the continuation byte by one, it would of course have meant that the character code should be increased by one (giving '@') since this is the last continuation byte, so that's how it is interpreted.
/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)
Previous text:
2003-03-05 23:01: Subject: decoder for utf-8
How is the decoder returned by Locale.Charset.decoder("utf-8") supposed to behave when fed a bytestream that is not valid UTF-8? It seems to return some peculiar results, instead of throwing an error (like utf8_to_string quite correctly does):
object dec = Locale.Charset.decoder("utf8"); dec->feed("\xc0\xc0")->drain();
(62) Result: "@"
utf8_to_string("\xc0\xc0");
utf8_to_string(): Expected continuation character at index 1 (got 0xc0).
/ rjb