New subject: Clean-room Engine.IO implementation committed to git 8.0/8.1

23 Nov 2016


      On Thu, Nov 24, 2016 at 12:20 AM, Marcus Comstedt (ACROSS) (Hail
Ilpalazzo!) @ Pike (-) developers forum 10353@lyskom.lysator.liu.se
wrote:
...
...
\U12345678 possibly should be an error, as it's not valid Unicode.
It's valid Pike.  Pike supports the full ISO/IEC 10646 31-bit range,
plus an equally large negative range.
...
so you could use string(32bit) for those sorts of
non-textual strings.
Not string(31bit)?
...
My statement about Unicode text specifically excludes
anything that isn't valid according to the Unicode standard.
Which makes it even worse since the set of valid characters change
with each release of the Unicode standard...
AIUI the Unicode Consortium has declared that they will never define
any characters beyond 0x10FFFF, as that would destroy UTF-16 as a
valid encoding. And you're right that "valid according to the Unicode
standard" is a bit too restrictive, but certainly "valid within the
declaration of character range" should be safe.
...
...
...
What type would "Foo" have?  How would you specify a UTF-8 encoded
literal?
Now, these are questions that can't truly be answered with the current
system. I would like the former to be string(7bit),
Then you are contradicting yourself, since you claimed that Unicode
text would _always_ be referred to as string(21bit), and "Foo" is
definitely Unicode text (both 'F' and 'o' have been part of the
Unicode standard since the first version).
string(7bit) would be implicitly upcastable to string(21bit), since
ASCII text can be represented validly as either bytes (with the top
bit clear) or as Unicode codepoint sequences.
...
...
and the latter
would be either string(7bit) or string(8bit) depending on whether
there are non-ASCII characters in it.
But how would the compiler know that the characters are UTF-8 encoded,
so that it does not assign a type of string(21bit) instead?
Right, and that's something that can't be done in the current
standard. Hence this entire proposal has to wait until some major
changes can be done.
In Python, it's done with a prefix - u"asdf" is a Unicode string, and
b"asdf" is a byte string. It would need to be something similarly
syntactic, once the two become actually different types. For today's
Pikes, though, that's not possible, so the safest way is to simply
keep track of it in the programmer's mind, without language support.
ChrisA

Re: Clean-room Engine.IO implementation committed to git 8.0/8.1