Seems like Standards.URI do neither encoding nor decoding according to rfc 2396:
Pike v7.7 release 3 running Hilfe v3.5 (Incremental Pike Frontend)
object uri = Standards.URI("http://foo/a%3Fb?c"); uri->path;
(1) Result: "/a%3Fb"
_Roxen.http_decode_string (uri->path);
(2) Result: "/a?b"
uri->path = _;
(3) Result: "/a?b"
(string) uri;
(4) Result: "http://foo/a?b?c"
Is it per design? I think it's pretty clear that it should handle this detail too. Unfortunately it's quite a problem to change that now since this class already is used extensively, and changes in quoting behavior is one of the most notorious compatibility problem generators.
Someone got any idea about how common it is that calling code "works around" this by doing its own encoding and decoding?
Hmm, I just realized that it strictly speaking isn't possible to decode the path without splitting it at the same time. Is this true for the other uri parts too?
Otoh, I reckon that the occasions when a user can do a meaningful distinction between a path segment separating "/" and a quoted "/" inside a path segment are extremely rare.
/ Martin Stjernholm, Roxen IS
Previous text:
2004-05-26 16:48: Subject: Quoting in Standards.URI
Seems like Standards.URI do neither encoding nor decoding according to rfc 2396:
Pike v7.7 release 3 running Hilfe v3.5 (Incremental Pike Frontend)
object uri = Standards.URI("http://foo/a%3Fb?c"); uri->path;
(1) Result: "/a%3Fb"
_Roxen.http_decode_string (uri->path);
(2) Result: "/a?b"
uri->path = _;
(3) Result: "/a?b"
(string) uri;
(4) Result: "http://foo/a?b?c"
Is it per design? I think it's pretty clear that it should handle this detail too. Unfortunately it's quite a problem to change that now since this class already is used extensively, and changes in quoting behavior is one of the most notorious compatibility problem generators.
Someone got any idea about how common it is that calling code "works around" this by doing its own encoding and decoding?
/ Martin Stjernholm, Roxen IS
The case I know of is in FTP URL:s, where you split on / to find the sequence of CD commands to send, but where each CD command may contain URL encoded slashes (thereby going down several directories in one go, assuming a UNIX like file system).
/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)
Previous text:
2004-05-26 17:00: Subject: Quoting in Standards.URI
Hmm, I just realized that it strictly speaking isn't possible to decode the path without splitting it at the same time. Is this true for the other uri parts too?
Otoh, I reckon that the occasions when a user can do a meaningful distinction between a path segment separating "/" and a quoted "/" inside a path segment are extremely rare.
/ Martin Stjernholm, Roxen IS
Seems like Standards.URI do neither encoding nor decoding according to rfc 2396:
True; it is kind of dumb that way; it just gives you the tools to take an URI apart and put it together again. We never got as far as setting an API for how to handle quoting, encoding and decoding issues for a higher level API; it was left out for possible future improvement.
I think it's pretty clear that it should handle this detail too.
Agreed. I do not think there would be any problem with compatibility, though, adding something now to do the trick, but then I also do not think encoding and decoding should happen automagically. Adding some additional primitives to the class should pose little problems, IMO.
Hmm, I just realized that it strictly speaking isn't possible to decode the path without splitting it at the same time. Is this true for the other uri parts too?
I'm not sure I followed that, but Standards.URI does all the splitting it can do based on its understanding of the URI format and all parts you get from it can then be decoded using tools elsewhere. The query part, for instance, can not contain an unquoted # character, nor can (assuming HTTP GET variable semantics) HTTP URL variable names contain an unquoted = character (which I believe might answer your question).
What are you trying to do? :-)
/ Johan Sundström (Achtung Liebe!)
Previous text:
2004-05-26 16:48: Subject: Quoting in Standards.URI
Seems like Standards.URI do neither encoding nor decoding according to rfc 2396:
Pike v7.7 release 3 running Hilfe v3.5 (Incremental Pike Frontend)
object uri = Standards.URI("http://foo/a%3Fb?c"); uri->path;
(1) Result: "/a%3Fb"
_Roxen.http_decode_string (uri->path);
(2) Result: "/a?b"
uri->path = _;
(3) Result: "/a?b"
(string) uri;
(4) Result: "http://foo/a?b?c"
Is it per design? I think it's pretty clear that it should handle this detail too. Unfortunately it's quite a problem to change that now since this class already is used extensively, and changes in quoting behavior is one of the most notorious compatibility problem generators.
Someone got any idea about how common it is that calling code "works around" this by doing its own encoding and decoding?
/ Martin Stjernholm, Roxen IS
I think arguments could be made both for and against unescaping URIs in create().
I vaguely recall a design choice (at least on my part) being made about Standards.URI not rewriting the URIs in any way unless provoked to do so, in order to facilitate comparison with other URIs.
(Looking at the implementation of java.net.URL I see the same behavior.)
Maybe addding a normalize method and overloading the comparison operator would be a better way to accomplish this.
/ Johan Schön (Firefruit)
Previous text:
2004-05-26 17:20: Subject: Quoting in Standards.URI
Seems like Standards.URI do neither encoding nor decoding according to rfc 2396:
True; it is kind of dumb that way; it just gives you the tools to take an URI apart and put it together again. We never got as far as setting an API for how to handle quoting, encoding and decoding issues for a higher level API; it was left out for possible future improvement.
I think it's pretty clear that it should handle this detail too.
Agreed. I do not think there would be any problem with compatibility, though, adding something now to do the trick, but then I also do not think encoding and decoding should happen automagically. Adding some additional primitives to the class should pose little problems, IMO.
Hmm, I just realized that it strictly speaking isn't possible to decode the path without splitting it at the same time. Is this true for the other uri parts too?
I'm not sure I followed that, but Standards.URI does all the splitting it can do based on its understanding of the URI format and all parts you get from it can then be decoded using tools elsewhere. The query part, for instance, can not contain an unquoted # character, nor can (assuming HTTP GET variable semantics) HTTP URL variable names contain an unquoted = character (which I believe might answer your question).
What are you trying to do? :-)
/ Johan Sundström (Achtung Liebe!)
As long as url->path returns the unescaped version, I think it's fine to store it internally in whatever format.
/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)
Previous text:
2004-05-26 17:27: Subject: Quoting in Standards.URI
I think arguments could be made both for and against unescaping URIs in create().
I vaguely recall a design choice (at least on my part) being made about Standards.URI not rewriting the URIs in any way unless provoked to do so, in order to facilitate comparison with other URIs.
(Looking at the implementation of java.net.URL I see the same behavior.)
Maybe addding a normalize method and overloading the comparison operator would be a better way to accomplish this.
/ Johan Schön (Firefruit)
I'm not sure I followed that, but Standards.URI does all the splitting it can do based on its understanding of the URI format /.../
It can do more splitting according to rfc 2396, namely to split paths into path segments (and path segment parameters). Not that that's something I'd want it to, since handling an array with path segment strings would be very clunky.
What are you trying to do? :-)
I'm trying to figure out on which level a problem with insufficient decoding should be fixed. I think I know now.
/ Martin Stjernholm, Roxen IS
Previous text:
2004-05-26 17:20: Subject: Quoting in Standards.URI
Seems like Standards.URI do neither encoding nor decoding according to rfc 2396:
True; it is kind of dumb that way; it just gives you the tools to take an URI apart and put it together again. We never got as far as setting an API for how to handle quoting, encoding and decoding issues for a higher level API; it was left out for possible future improvement.
I think it's pretty clear that it should handle this detail too.
Agreed. I do not think there would be any problem with compatibility, though, adding something now to do the trick, but then I also do not think encoding and decoding should happen automagically. Adding some additional primitives to the class should pose little problems, IMO.
Hmm, I just realized that it strictly speaking isn't possible to decode the path without splitting it at the same time. Is this true for the other uri parts too?
I'm not sure I followed that, but Standards.URI does all the splitting it can do based on its understanding of the URI format and all parts you get from it can then be decoded using tools elsewhere. The query part, for instance, can not contain an unquoted # character, nor can (assuming HTTP GET variable semantics) HTTP URL variable names contain an unquoted = character (which I believe might answer your question).
What are you trying to do? :-)
/ Johan Sundström (Achtung Liebe!)
And you arrived at what conclusion?
/ Johan Schön (Firefruit)
Previous text:
2004-05-26 17:43: Subject: Quoting in Standards.URI
I'm not sure I followed that, but Standards.URI does all the splitting it can do based on its understanding of the URI format /.../
It can do more splitting according to rfc 2396, namely to split paths into path segments (and path segment parameters). Not that that's something I'd want it to, since handling an array with path segment strings would be very clunky.
What are you trying to do? :-)
I'm trying to figure out on which level a problem with insufficient decoding should be fixed. I think I know now.
/ Martin Stjernholm, Roxen IS
pike-devel@lists.lysator.liu.se