[...] If the unicode sequence "Ã¥" do happen to occur (as uncommon as it might be) then it should still be intact on the other side.
Um, yes? It would be encoded as %c3%83%c2%a5, which would then be decoded as "Ã¥" by the decode function. That's pretty intact, no?
No. With "the other side" I meant in the formatted URI, not when the it has been picked apart into its components again by another object. I.e. something like this:
object o = Standards.URI("http://x.com/"); o->path = "recept/räksmörgås.html"; (string) o;
Result: "http://x.com/recept/r%C3%A4ksm%C3%B6rg%C3%A5s.html"
This is a perfectly acceptable IRI that can be put into an iso-8859-1 document. Applies when the URI/IRI is parsed too, of course. That's the reason it can be useful to skip the encoding of chars outside US-ASCII.
/.../ So rather than having a property, I think we should have a decode function, to which the strings can be passed after the user code separates them on "/" or whatever URI syntax still remains in the string.
Sure, why not? Maybe it could take a charset too to know how to handle the 8-bit chars. If the extra encoding gets likewise optional, it both gets more symmetric and works in the use case I've been trying to describe.
(In retrospect, it would be better if the URI class actually parsed all the URI syntax, rather than returning something half parsed. That would mean path being array(string) instead of string. /.../
I'm not so sure; a path on array form gets unbearably cumbersome to handle compared to the standard string form. An alternative is to only decode as much as possible, i.e. leave only %2F (for "/") and %25 (for "%"). That's a consistent encoding too that can be decoded the same way after path splitting, if the user wants to. It's a bit unfortunate that the "%" chars have to left encoded too, though.