Re: incorrect rfc2047 MIME decoding?

12 Dec 2002


      ...
So your array must end up being encoded as:
"=?us-ascii?q?Hello?= =?iso-8859-1?q?Wor?= =?iso-8859-2?q?ld?= =?us-ascii?q?!?="
That's not correct.  By setting the charset for "Hello" to 0, rather
than "us-ascii", I have requested that the Hello part is encoded
literally, and not as an encoded-word.
...
or, if you choose to "extend" the charset into the adjacent string
(which only works if the charset is a superset of us-ascii):
"=?iso-8859-1?q?HelloWor?= =?iso-8859-2?q?ld!?="
That's not correct either.  Since no charset is provided for the
"Hello" part, I can't assume it's a subset of <whatever the encoding
for "Wor" is>, and I can't even assume that it a subset of "us-ascii"
as you did in the first suggestion.
/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)
Previous text:
...
2002-12-12 23:46:
Subject: Re: incorrect rfc2047 MIME decoding?

In the last episode (Dec 12), Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum said:
...
Probably the most conformant way is to only eat whitespace between
two encoded words.  Since the RFC doesn't seem to mention any other
kinds of whitespace, the intention might be that they should be left
alone.
It does pose something of a semantic problem for _encode_ though:
Given the input
x = ({ ({ "Hello", 0 }),
       ({ "Wor", "iso-8859-1" }),
       ({ "ld", "iso-8859-2" }),
       ({ "!", 0 }) }) }) ;
what should MIME.encode_words_text(x, "q") produce?  It is not
possible to put the first encoded world directly after the "o", but if
a space is inserted the resulting string will decode to
Hello World !
and not
HelloWorld!
as intended.  Tricky...
RFC2047 says that "An 'encoded-word' that appears within a 'phrase'
MUST be separated from any adjacent 'word', 'text' or 'special' by
'linear-white-space'".  That means any strings adjacent to a string
that gets encoded must also get encoded, unless they contain a leading
(or trailing) space.  So your array must end up being encoded as:
"=?us-ascii?q?Hello?= =?iso-8859-1?q?Wor?= =?iso-8859-2?q?ld?= =?us-ascii?q?!?="
or, if you choose to "extend" the charset into the adjacent string
(which only works if the charset is a superset of us-ascii):
"=?iso-8859-1?q?HelloWor?= =?iso-8859-2?q?ld!?="
.  If element 0 was "Hello ", and element 3 was " !", only then could
you leave them unencoded, and the result would be
"Hello =?iso-8859-1?q?Wor?= =?iso-8859-2?q?ld?= !"
/ Brevbäraren

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: incorrect rfc2047 MIME decoding?