Protocols.HTTP.http_encode_string

15 Jul 2008


      ...
How so?  If there is only one, then that one is always the right one.
It's not quite so simple when communicating with the outside world
which doesn't unify the two concepts.
The W3C chose to make IRI a separate standard instead of extending
URI. They've obviously pondered that approach at length, so I guess
they did it with good reason.
...
...
realize that they actually aren't using URIs anymore if they go
outside US-ASCII, which probably is a widespread misconception.
Since the IRIs can be mapped to URIs, they can be made to actually use
URIs without them having to realize.
The receiving side might not do the same transformation back. E.g.
when the URI is passed as an url over http there is no obligation -
not even in the latest standards - to do the reverse URI-to-IRI
transformation on the url after receiving the request. This makes it
good to be aware of what is happening, so one can judge better how the
receiver might (mis)behave.
...
When must one transform from an URI then?
Huh? To process it, of course. E.g. unicode data sent in a web form,
where the de-facto behavor of modern browsers is to do an IRI-to-URI
transformation first. It'd be nice to have that decoding built into
the class.
...
Using the URI representation seems more powerful since it can
represent both URIs and IRIs.
The problem is that a URI can't fully represent an IRI. It can only
contain a (transformed) IRI, just like an octet string can contain a
URI.
...
Of course, having a function to decode the utf-8 sequences is
something we want, but this should be possible (and done in the same
way) regardless of whether you start with an IRI or an IRI mapped into
an URI, IMO.
Perhaps, but not if you start with an URI that isn't a transformed
IRI. Or are you suggesting that the URI class should just try to
decode it as an IRI and silently continue without the utf-8 decode if
that fails?
...
/.../ I don't think the performance issue warrants a confusing split
in the namespace.
I didn't say it was a performance issue either, rather one of
functionality. An IRI can e.g. contain "ôôÌ" in a unicode context
without any encoding whatsoever, whereas a URI can't. When writing
documents containing IRIs in a unicode environment it is of course
nice to see and handle the real glyphs directly. Hence the class
should be able to both produce and parse IRIs without escaping the
non-US-ASCII chars.
...
Whether the "picked apart" pieces contain wider chars or not seems
irrelevant since you need to decode it anyway (%25, %2f).
When used as I described above, the wider chars wouldn't be encoded to
begin with.
But besides, more functionality to alleviate the user from decoding
%XX escapes is in order.
...
The decoding should give you wide chars regardless of whether you
start with an IRI or an IRI mapped into an IRI (see above).
I assume at least one of the "IRI" there should be "URI". Decoding a
URI in general can't produce wide chars since it can't assume that the
URI is a transformed IRI.
Footnote: Now my pike discussion quota is used up for at least today.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Protocols.HTTP.http_encode_string