+ if(String.width(out)>8) + out=string_to_utf8(out);
Ok, seriously, I think for every proper use of String.width (are there any?) there are at least two improper ones.
If the string is supposed to be transport encoded with UTF-8, you need to do it for _all_ strings, otherwise you will get invalid UTF-8 if the string contains characters > 127 but < 256.
But who says that a "tabular" should be UTF-8 encoded anyway? For a format that has no builtin encoding indicator, transport en/decoding should be the task of the caller.
Pretty much nothing should be UTF-8 encoded unless there is some standard involved, so yea, sounds like it should just be removed.
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
if(String.width(out)>8)
out=string_to_utf8(out);
If the string is supposed to be transport encoded with UTF-8, you need to do it for _all_ strings, otherwise you will get invalid UTF-8 if the string contains characters > 127 but < 256.
But who says that a "tabular" should be UTF-8 encoded anyway? For a format that has no builtin encoding indicator, transport en/decoding should be the task of the caller.
True. But this "conversion" is for debugging output only. This is *not* in the production path.
Be that as it may, an output with an undeterministic character encoding isn't really what you want even when debugging.
Frankly, it seems to me that werror() ought to encode the message using the environment character encoding, and escape any unencodable character as \uxxxx or suchlike. But until that happens, may I suggest printing with %q instead?
werror() (and probably write()) really should encode using the charset of the terminal, yes.
write() is a little tricky because it can be used to write binary data by filters. It boils down yet again to the lack of a datatype for octet sequences...
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Be that as it may, an output with an undeterministic character encoding isn't really what you want even when debugging.
Frankly, it seems to me that werror() ought to encode the message using the environment character encoding, and escape any unencodable character as \uxxxx or suchlike. But until that happens, may I suggest printing with %q instead?
Technically, you're right, and for now the safe option would be to use something like %q instead. However, considering the fact that most display devices/environments are moving towards UTF-8 as the (defacto) default these days, having it in UTF-8 directly allows for more practical/usable debugging output. And even if your particular terminal doesn't support UTF-8 but does something else, then there are standard converters which convert from UTF-8 to whichever you are using now; but you'd be hard-pressed to find a (standard) converter from the ouput %q generates to whichever encoding your terminal is using.
Even if I were to buy that, you still would need to make sure that _all_ your output is in UTF-8, not a mix of UTF-8 and ISO-8859-1 like it is now.
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
Even if I were to buy that, you still would need to make sure that _all_ your output is in UTF-8, not a mix of UTF-8 and ISO-8859-1 like it is now.
Hmmm, you have a point there. It would mean I'd take out the conditional, and just convert always. I think I'd prefer that then.
pike-devel@lists.lysator.liu.se