Tabular.pike 1.8

List overview All Threads
Download

newer

older

eureka reinstall

convert_1_to_0() truncating...

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

4 Jan 2010 4 Jan '10

4:25 p.m.

+ if(String.width(out)>8) + out=string_to_utf8(out);

Ok, seriously, I think for every proper use of String.width (are there any?) there are at least two improper ones.

If the string is supposed to be transport encoded with UTF-8, you need to do it for _all_ strings, otherwise you will get invalid UTF-8 if the string contains characters > 127 but < 256.

But who says that a "tabular" should be UTF-8 encoded anyway? For a format that has no builtin encoding indicator, transport en/decoding should be the task of the caller.

Show replies by date

Peter Bortas ＠ Pike developers forum

4 Jan 4 Jan

6:40 p.m.

Pretty much nothing should be UTF-8 encoded unless there is some standard involved, so yea, sounds like it should just be removed.

Stephen R. van den Berg

5 Jan 5 Jan

12:42 a.m.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:

...

```
if(String.width(out)>8)
```
```
  out=string_to_utf8(out);
```

...

If the string is supposed to be transport encoded with UTF-8, you need to do it for _all_ strings, otherwise you will get invalid UTF-8 if the string contains characters > 127 but < 256.

...

But who says that a "tabular" should be UTF-8 encoded anyway? For a format that has no builtin encoding indicator, transport en/decoding should be the task of the caller.

True. But this "conversion" is for debugging output only. This is *not* in the production path.

-- Sincerely, Stephen R. van den Berg. "Papers in string theory are published at a rate above the speed of light. This is no problem since no information is being transmitted." -- H. Kleinert

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

9:30 a.m.

Be that as it may, an output with an undeterministic character encoding isn't really what you want even when debugging.

Frankly, it seems to me that werror() ought to encode the message using the environment character encoding, and escape any unencodable character as \uxxxx or suchlike. But until that happens, may I suggest printing with %q instead?

Per Hedbor () ＠ Pike (-) developers forum

9:50 a.m.

werror() (and probably write()) really should encode using the charset of the terminal, yes.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

10:05 a.m.

write() is a little tricky because it can be used to write binary data by filters. It boils down yet again to the lack of a datatype for octet sequences...

Per Hedbor () ＠ Pike (-) developers forum

10:10 a.m.

Indeed.

Stephen R. van den Berg

10:16 a.m.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:

...

Be that as it may, an output with an undeterministic character encoding isn't really what you want even when debugging.

...

Frankly, it seems to me that werror() ought to encode the message using the environment character encoding, and escape any unencodable character as \uxxxx or suchlike. But until that happens, may I suggest printing with %q instead?

Technically, you're right, and for now the safe option would be to use something like %q instead. However, considering the fact that most display devices/environments are moving towards UTF-8 as the (defacto) default these days, having it in UTF-8 directly allows for more practical/usable debugging output. And even if your particular terminal doesn't support UTF-8 but does something else, then there are standard converters which convert from UTF-8 to whichever you are using now; but you'd be hard-pressed to find a (standard) converter from the ouput %q generates to whichever encoding your terminal is using.

-- Sincerely, Stephen R. van den Berg. E-mails should be like a lady's skirt: Long enough to cover the subject, and short enough to be interesting.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

10:25 a.m.

Even if I were to buy that, you still would need to make sure that _all_ your output is in UTF-8, not a mix of UTF-8 and ISO-8859-1 like it is now.

Stephen R. van den Berg

10:29 a.m.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:

...

Even if I were to buy that, you still would need to make sure that _all_ your output is in UTF-8, not a mix of UTF-8 and ISO-8859-1 like it is now.

Hmmm, you have a point there. It would mean I'd take out the conditional, and just convert always. I think I'd prefer that then.

-- Sincerely, Stephen R. van den Berg. E-mails should be like a lady's skirt: Long enough to cover the subject, and short enough to be interesting.

5686

Age (days ago)

5687

Last active (days ago)

pike-devel@lists.lysator.liu.se

9 comments

4 participants

tags (0)

participants (4)

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum
Per Hedbor () ＠ Pike (-) developers forum
Peter Bortas ＠ Pike developers forum
Stephen R. van den Berg