By "binary data", I mean eight-bit strings of arbitrary bytes - like you'd read from a file or something. Currently, functions like Stdio.read_file simply return "string", but they'll effectively be returning string(8bit).
No, Stdio.read_file currently returns string(8bit). That simply means that each element will be in the range 0-255. If you were to change the meaning to something else, you would create compatibility issues by making some currently valid assignments involving string(8bit) invalid.
\U12345678 possibly should be an error, as it's not valid Unicode.
It's valid Pike. Pike supports the full ISO/IEC 10646 31-bit range, plus an equally large negative range.
so you could use string(32bit) for those sorts of non-textual strings.
Not string(31bit)?
My statement about Unicode text specifically excludes anything that isn't valid according to the Unicode standard.
Which makes it even worse since the set of valid characters change with each release of the Unicode standard...
What type would "Foo" have? How would you specify a UTF-8 encoded literal?
Now, these are questions that can't truly be answered with the current system. I would like the former to be string(7bit),
Then you are contradicting yourself, since you claimed that Unicode text would _always_ be referred to as string(21bit), and "Foo" is definitely Unicode text (both 'F' and 'o' have been part of the Unicode standard since the first version).
and the latter would be either string(7bit) or string(8bit) depending on whether there are non-ASCII characters in it.
But how would the compiler know that the characters are UTF-8 encoded, so that it does not assign a type of string(21bit) instead?