On Thu, Nov 24, 2016 at 12:40 AM, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum 10353@lyskom.lysator.liu.se wrote:
In Python, it's done with a prefix - u"asdf" is a Unicode string, and b"asdf" is a byte string.
Since nominally strings are Unicode (with the extended ISO 10646 range) strings now, I think "asdf" can be left as the syntax for that, and we only need a new syntax for the byte string ("buffer") type. We can also look at Java, which has byte[] as the type for byte strings, requiring literals like {'a','s','d','f'}, but I would like to see something a bit more convenient to use. :-)
Agreed. In Python 3.0+, that's how it is - an unadorned string is Unicode, and b"asdf" is a byte string. (Python 2.7 has it the other way - an unadorned string is bytes, and u"asdf" for Unicode - and because of that, you're allowed to put the prefixes on both types of string. But Pike needn't do that.)
ChrisA
I'm not sure I follow. Which problem should this solve, a mark in the string struct what the type of data the string contains?
The simplest runtime implementation would be non-zero subtype on the string, to mark that it is binary data and not unicode text. (Although that might make the stringp operator a bit ambiguous.) The main benefits are in static typechecking, making sure you don't send unencoded text to I/O functions and suchlike.
pike-devel@lists.lysator.liu.se