How so? If there is only one, then that one is always the right one.
It's not quite so simple when communicating with the outside world which doesn't unify the two concepts.
The W3C chose to make IRI a separate standard instead of extending URI. They've obviously pondered that approach at length, so I guess they did it with good reason.
realize that they actually aren't using URIs anymore if they go outside US-ASCII, which probably is a widespread misconception.
Since the IRIs can be mapped to URIs, they can be made to actually use URIs without them having to realize.
The receiving side might not do the same transformation back. E.g. when the URI is passed as an url over http there is no obligation - not even in the latest standards - to do the reverse URI-to-IRI transformation on the url after receiving the request. This makes it good to be aware of what is happening, so one can judge better how the receiver might (mis)behave.
When must one transform from an URI then?
Huh? To process it, of course. E.g. unicode data sent in a web form, where the de-facto behavor of modern browsers is to do an IRI-to-URI transformation first. It'd be nice to have that decoding built into the class.
Using the URI representation seems more powerful since it can represent both URIs and IRIs.
The problem is that a URI can't fully represent an IRI. It can only contain a (transformed) IRI, just like an octet string can contain a URI.
Of course, having a function to decode the utf-8 sequences is something we want, but this should be possible (and done in the same way) regardless of whether you start with an IRI or an IRI mapped into an URI, IMO.
Perhaps, but not if you start with an URI that isn't a transformed IRI. Or are you suggesting that the URI class should just try to decode it as an IRI and silently continue without the utf-8 decode if that fails?
/.../ I don't think the performance issue warrants a confusing split in the namespace.
I didn't say it was a performance issue either, rather one of functionality. An IRI can e.g. contain "ôôÌ" in a unicode context without any encoding whatsoever, whereas a URI can't. When writing documents containing IRIs in a unicode environment it is of course nice to see and handle the real glyphs directly. Hence the class should be able to both produce and parse IRIs without escaping the non-US-ASCII chars.
Whether the "picked apart" pieces contain wider chars or not seems irrelevant since you need to decode it anyway (%25, %2f).
When used as I described above, the wider chars wouldn't be encoded to begin with.
But besides, more functionality to alleviate the user from decoding %XX escapes is in order.
The decoding should give you wide chars regardless of whether you start with an IRI or an IRI mapped into an IRI (see above).
I assume at least one of the "IRI" there should be "URI". Decoding a URI in general can't produce wide chars since it can't assume that the URI is a transformed IRI.
Footnote: Now my pike discussion quota is used up for at least today.