Re: incorrect rfc2047 MIME decoding?

12 Dec 2002


      In the last episode (Dec 12), Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum said:
...
Probably the most conformant way is to only eat whitespace between
two encoded words.  Since the RFC doesn't seem to mention any other
kinds of whitespace, the intention might be that they should be left
alone.
It does pose something of a semantic problem for _encode_ though:
Given the input
x = ({ ({ "Hello", 0 }),
       ({ "Wor", "iso-8859-1" }),
       ({ "ld", "iso-8859-2" }),
       ({ "!", 0 }) }) }) ;
what should MIME.encode_words_text(x, "q") produce?  It is not
possible to put the first encoded world directly after the "o", but if
a space is inserted the resulting string will decode to
Hello World !
and not
HelloWorld!
as intended.  Tricky...
RFC2047 says that "An 'encoded-word' that appears within a 'phrase'
MUST be separated from any adjacent 'word', 'text' or 'special' by
'linear-white-space'".  That means any strings adjacent to a string
that gets encoded must also get encoded, unless they contain a leading
(or trailing) space.  So your array must end up being encoded as:
"=?us-ascii?q?Hello?= =?iso-8859-1?q?Wor?= =?iso-8859-2?q?ld?= =?us-ascii?q?!?="
or, if you choose to "extend" the charset into the adjacent string
(which only works if the charset is a superset of us-ascii):
"=?iso-8859-1?q?HelloWor?= =?iso-8859-2?q?ld!?="
.  If element 0 was "Hello ", and element 3 was " !", only then could
you leave them unencoded, and the result would be
"Hello =?iso-8859-1?q?Wor?= =?iso-8859-2?q?ld?= !"

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: incorrect rfc2047 MIME decoding?