On Wed, Nov 23, 2016 at 11:10 PM, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum 10353@lyskom.lysator.liu.se wrote:
I agree, but using string(8bit) to mean "binary data" is something that's 100% backward compatible.
It would not be backwards compatible, since that is not what string(8bit) means today.
By "binary data", I mean eight-bit strings of arbitrary bytes - like you'd read from a file or something. Currently, functions like Stdio.read_file simply return "string", but they'll effectively be returning string(8bit).
Unicode text would always be referred to as string(21bit), even if it happens to contain nothing but Latin-1 characters.
That doesn't really make sense. So you say that "R\xe4ksm\xf6rg\xe5s" would have type string(21bit)? What type would "\U12345678" have?
\U12345678 possibly should be an error, as it's not valid Unicode. Maybe the Pike string type can be used for other things, but they're not Unicode text - so you could use string(32bit) for those sorts of non-textual strings. (I don't know of any use cases, so I can't say beyond that.) My statement about Unicode text specifically excludes anything that isn't valid according to the Unicode standard.
What type would "Foo" have? How would you specify a UTF-8 encoded literal?
Now, these are questions that can't truly be answered with the current system. I would like the former to be string(7bit), and the latter would be either string(7bit) or string(8bit) depending on whether there are non-ASCII characters in it. But they're probably both just type 'string' at the moment.
ChrisA