And to reiterate, I think this whole line of discussion isn't particularly important compared to the simple argument that the encoding called "utf-8" in the Charset module should comply to the UTF-8 standard.
As long as it correctly decodes text which complies to the UTF-8 standard (and always generates standard compilant output when encoding, of course), I don't see any particular problem with disregarding other parts of the UTF-8 standard. It's not like we need to go through some kind of UTF-8 certification or anything.