Re: Clean-room Engine.IO implementation committed to git 8.0/8.1

23 Nov 2016


      On Wed, Nov 23, 2016 at 10:30 PM, Marcus Comstedt (ACROSS) (Hail
Ilpalazzo!) @ Pike (-) developers forum 10353@lyskom.lysator.liu.se
wrote:
...
I think you are conflagrating range with interpretation.  Both a
Latin1 string and an UTF-8 encoded one are 8-bit strings (with a 0-255
range).  What would be useful is a datatype that declares that the
elements are not Unicode characters (as they are in the Latin1 string
case) but some raw binary encoding (as they are in the UTF-8 case),
optionally also specifying which encoding.  This has been suggested
before (with "buffer" as a suggestion for the name of the new
datatype), but it has never been implemented due to the difficulty of
introducing such a datatype in a consistent way while still retaining
backward compatibility.
I agree, but using string(8bit) to mean "binary data" is something
that's 100% backward compatible. Unicode text would always be referred
to as string(21bit), even if it happens to contain nothing but Latin-1
characters.
FWIW, I would support an actual division of data types, such that you
cannot concatenate one onto the other. But having seen what happened
with Python 2 -> Python 3, I would expect this to be a fairly
significant backward compatibility break. It'd probably be something
for Pike 9.0 or even 10.0. There would be an opportunity to learn from
Python here, though, and maybe do things more smoothly. In any case,
the first step is to *right now* think about binary data and textual
data as different things, and distinguish them and convert them as
appropriate.
ChrisA

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: Clean-room Engine.IO implementation committed to git 8.0/8.1