On Sun, Oct 30, 2016 at 9:17 PM, Chris Angelico rosuav@gmail.com wrote:
On Sun, Oct 30, 2016 at 4:17 AM, Martin Karlgren marty@roxen.com wrote:
Adding charset decoding to MIME.Message sounds good to me, perhaps with a flag to enable it on decoding? (A compat problem I can think of is that applications may assume that decoded data is 8bit strings and fail to apply proper encoding before writing to file, causing an exception.)
I agree about backward compat, and that's a bit problematic. So here's my thinking: MIME.UnicodeMessage will be a subclass of MIME.Message with the express goal of making everything use 21-bit strings. Any time it returns an eight-bit string, that is a bug to be fixed. So future incompatibility won't be a problem, as it's expressly documented that way; and past compatibility is fine, because MIME.Message itself isn't changing. Methods like MIME.Message()->get_filename, which currently do the decoding at that late point, can simply be overridden in UnicodeMessage.
Does that seem like a reasonable API?
I've pushed a change to 8.1 that ought to be 100% backward compatible. If there's a problem, I can revert it, but there shouldn't be. (Just in case, it's not in 8.0.) The two notable features are:
1) MIME.UnicodeMessage, as described above 2) MIME.parse_headers() now takes an additional parameter 'unicode'.
Everything else should be completely invisible to most programs, and both of these can be ignored.
ChrisA