I take that you mean that it would _only_ encode non-ascii characters then? Because the way the API looks now, we have no way of knowing which characters are meant to be metacharacters unless we require that all metacharacters are already encoded in the input. /.../
My intention was that it would return a thoroughly and correctly encoded url. That means that it would encode characters that are neither in the reserved set (i.e. what you call metacharacters) nor the unreserved set (i.e. US-ASCII letters, digits and a few other chars).
You're right that the current implementation apparently assumes that any reserved chars occurring in the component variables should retain their metameaning (e.g. "/" in path). This should be stated more clearly to avoid confusion, but it's not necessarily a problem: It'd be easy to add more functions to build the components from parts where every char is taken as data.
To take the path as an example, one could create a function build_path:
void build_path (array(string) path_segments) { path = map (path_segments, encode_reserved) * "/"; }
where encode_reserved only encodes the reserved chars and "%". Then the encoded_uri would do the rest of the job. E.g:
object uri = Standards.URI("http://foo.com"); uri->build_path (({"odd/path%", "räksmörgås.html"})); uri->path;
Result: "odd%2Fpath%25/räksmörgås.html"
uri->encoded_uri ("iso-8859-1");
Result: "http://foo.com/odd%2Fpath%25/r%C3%A4ksm%C3%B6rg%C3%A5s.html"
It'd be neat to use the getters and setters for this, so that we get a virtual variable called "split_path" or something.
Apart from the fact that you miss out on "%", x->path*"/" isn't all that inconvenient IMO. And it would allow you to use x->path*"\" on NT if you really want to. :)
I don't understand what you mean with missing out on "%" there. If you're suggesting that the user should simply join a fully decoded and splitted path using path*"/" then the only effect is that the user in his/hers own code reintroduces the ambiguity we're trying to avoid. That's not a solution.
Btw, after reading RFC 3986 section 2.2 more carefully, it's clear that no reserved character can be decoded in the path component (or in any other component for that matter), since even if some reserved chars have no metameaning for a component in the standard, they can still have scheme-specific or implementation-specific metameaning if they occur literally.
I.e. even a function that returns the path segments in an array can't decode the reserved chars, unless it assumes that the implementation - i.e. the caller - doesn't differentiate between the meta- and data meaning of those chars. That'd in most cases be a user friendly assumption, but still one that should be stated.