Protocols.HTTP.http_encode_string

21 Jul 2008


      ...
I take that you mean that it would _only_ encode non-ascii characters
then?  Because the way the API looks now, we have no way of knowing
which characters are meant to be metacharacters unless we require that
all metacharacters are already encoded in the input. /.../
My intention was that it would return a thoroughly and correctly
encoded url. That means that it would encode characters that are
neither in the reserved set (i.e. what you call metacharacters) nor
the unreserved set (i.e. US-ASCII letters, digits and a few other
chars).
You're right that the current implementation apparently assumes that
any reserved chars occurring in the component variables should retain
their metameaning (e.g. "/" in path). This should be stated more
clearly to avoid confusion, but it's not necessarily a problem: It'd
be easy to add more functions to build the components from parts where
every char is taken as data.
To take the path as an example, one could create a function
build_path:
void build_path (array(string) path_segments)
  {
    path = map (path_segments, encode_reserved) * "/";
  }
where encode_reserved only encodes the reserved chars and "%". Then
the encoded_uri would do the rest of the job. E.g:
...
object uri = Standards.URI("http://foo.com");
uri->build_path (({"odd/path%", "räksmörgås.html"}));
uri->path;
Result: "odd%2Fpath%25/räksmörgås.html"
...
uri->encoded_uri ("iso-8859-1");
Result: "http://foo.com/odd%2Fpath%25/r%C3%A4ksm%C3%B6rg%C3%A5s.html"
It'd be neat to use the getters and setters for this, so that we get a
virtual variable called "split_path" or something.
...
Apart from the fact that you miss out on "%", x->path*"/" isn't all
that inconvenient IMO.  And it would allow you to use x->path*"\" on
NT if you really want to.  :)
I don't understand what you mean with missing out on "%" there. If
you're suggesting that the user should simply join a fully decoded and
splitted path using path*"/" then the only effect is that the user in
his/hers own code reintroduces the ambiguity we're trying to avoid.
That's not a solution.
Btw, after reading RFC 3986 section 2.2 more carefully, it's clear
that no reserved character can be decoded in the path component (or in
any other component for that matter), since even if some reserved
chars have no metameaning for a component in the standard, they can
still have scheme-specific or implementation-specific metameaning if
they occur literally.
I.e. even a function that returns the path segments in an array can't
decode the reserved chars, unless it assumes that the implementation -
i.e. the caller - doesn't differentiate between the meta- and data
meaning of those chars. That'd in most cases be a user friendly
assumption, but still one that should be stated.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Protocols.HTTP.http_encode_string