I would like to charactarize this whole discussion as yet another time the confusion between byte-arrays and strings mess things up. The last time was when PCRE was integrated, unless I remember incorrectly.
There are btw. more or less standardized methods to write and read unicode files on some platforms, a method to get a platform-unicode-to-pikestring-file object would be fairly useful.
Windows have it's widechar (files are mostly stored as utf16-le), and modern unixen tends to use utf-8 (for mainly american reasons)