Yes, _instansiating_ a Locale.Charset.decoder can throw an error.
Using utf8_to_string for utf-8 wouldn't make the parser strict, you still wouldn't catch errors in other encodings.
You are welcome to implement such a strict mode. In addition to detecting illegal sequences in UTF-* and ISO-2022, it should do range checking on "normal" character encodings, so that you can't use \x7f in US-ASCII for example.
/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)
Previous text:
2003-03-06 11:03: Subject: decoder for utf-8
Well, Locale.Charset.decoder does at least throw when fed an encoding name it can't recognize:
Locale.Charset.decoder("foo");
Unknown character encoding foo /usr/local/pike/7.4.13/lib/modules/_Charset.pmod:214: Locale.Charset->decoder("foo")
and that certainly is a Good Thing. The current behavior on "utf-8" unfortunately rules out using the decoder in an XML parser that wants to make a best effort to comply with the spec (even if full compliance isn't a realistic goal, in view of the bloated overengineered specification, *sigh*). That of course can be worked around by special-casing "utf-8" to use utf8_to_string, which seems to be more strict. But who knows what traps lurk in the handling of other encodings...
Wishful thinking: perhaps someday the Charset module might support a "strict mode", where it refuses to swallow sequences that are invalid in the given encoding?
/ rjb