string(0..255)

List overview All Threads
Download

newer

older

RFC optimisation of file.c read...

Re: RFC int2string optimisation

Martin Stjernholm, Roxen IS ＠ Pike developers forum

7 Aug 2008 7 Aug '08

5:45 a.m.

#pragma strict_types int main (int argc, array(string) argv) { string s = argv[0]; Stdio.File ("test", "cw")->write (s); }

produces these warnings for the write call:

foo.pike:5: Warning: Type mismatch in argument 1 to write. foo.pike:5: Warning: Expected: array(string(0..255)) | string(0..255) | __attribute__("sprintf_format", string(0..255)). foo.pike:5: Warning: Got : string.

This is strictly speaking correct and in the long run it's probably a good idea to restrict I/O functions to octet strings. But in practice this warning is very cumbersome and will probably cause widespread proliferation of the fairly clumsy type "string(0..255)" if one tries to fix it. My view:

o There should be a better type name than "string(0..255)" for octet strings. (That type cannot be a typedef since typedef'ed types still don't work very well.)

o If we're to separate the types for octet and char strings, I think it should be done also when char strings happen to contain only narrow 8-bit chars. I.e. the string width is not really the important characteristic, rather it's whether the string contains characters or binary octets.

o Fixing the types for the I/O functions requires discussion (see e.g. points above). It ought to be done consistently over all I/O subsystems. Compat and utility functions might be required. This is not something to embark on at this time, I think.

To sum up, for 7.8 I think it's better to not introduce string(0..255) in write() etc. Opinions?

Show replies by date

Martin Nilsson (Opera Mini - AFK!) ＠ Pike (-) developers forum

7 Aug 7 Aug

7:05 a.m.

It wouldn't be as bad if all the interfaces were typed correctly, but as we don't have time for that for this release, I think only typing return values as restricted strings in Pike 7.8 is a reasonable way to go initially.

Peter Bortas ＠ Pike developers forum

7:05 a.m.

...

o There should be a better type name than "string(0..255)" for octet strings. (That type cannot be a typedef since typedef'ed types still don't work very well.)

Yes, we already agreed to add a binary type in 7.9.

...

o If we're to separate the types for octet and char strings, I think it should be done also when char strings happen to contain only narrow 8-bit chars. I.e. the string width is not really the important characteristic, rather it's whether the string contains characters or binary octets.

Yes, already agreed upon.

...

o Fixing the types for the I/O functions requires discussion (see e.g. points above). It ought to be done consistently over all I/O subsystems. Compat and utility functions might be required. This is not something to embark on at this time, I think.

Yes. That's the prime reason for obsoleting the second argument to read_file.

...

To sum up, for 7.8 I think it's better to not introduce string(0..255) in write() etc. Opinions?

I think the warning is correct. You are probably doing something wrong when writing a wide string like that.

Peter Bortas ＠ Pike developers forum

7:15 a.m.

...

...
To sum up, for 7.8 I think it's better to not introduce string(0..255) in write() etc. Opinions?

I think the warning is correct. You are probably doing something wrong when writing a wide string like that.

On the other hand things are obviously not typed well enough internally yet. I'd agree with disabling the warning for now then.

Martin Stjernholm, Roxen IS ＠ Pike developers forum

7:30 a.m.

o> I think the warning is correct. You are probably doing something wrong

...

when writing a wide string like that.

Apparently you missed my point. The string isn't really wide. The problem is trying to correct the type: First x has to be changed to string(0..255), then argv has to be changed to array(string(0..255)). In real code the proliferation of the string(0..255) type can go very far.

(People familiar with C and C++ is perhaps familiar with the similar "const plague": You start with adding an innocent "const" to a function argument, and before you know it you've ended up fighting with about half of the declarations throughout the whole project to put in those pesky consts everywhere, and even so you have to add casts to get rid of the warnings from all the imported libs.)

IMO all that effort is only worthwhile if the other conditions I mentioned are addressed. It's not only a matter of missing internal typing, it's also that both the name and the semantics of the type aren't really good.

Johan Sundstr�m (Achtung Liebe!) ＠ Pike (-) developers forum

11:50 a.m.

...

...
o There should be a better type name than "string(0..255)" for octet strings. (That type cannot be a typedef since typedef'ed types still don't work very well.)

Yes, we already agreed to add a binary type in 7.9.

Under the name "buffer", I hope?

Peter Bortas ＠ Pike developers forum

12:15 p.m.

I seem to remember that was the to candidate, yes.

Peter Bortas ＠ Pike developers forum

12:20 p.m.

("p" is giving up on my keyboard. Insert "p"'s where apropriate or funny.)

Martin Bähr

12:54 p.m.

funny can do:

("p" pis giving pup on my kepybopard. inspert "p"'s where papropripatep por funnpy.)

Martin Stjernholm, Roxen IS ＠ Pike developers forum

12:45 p.m.

Hmm, that was a bit unexpected choice. What's the rationale?

Johan Sundstr�m (Achtung Liebe!) ＠ Pike (-) developers forum

12:55 p.m.

While I think it was not discussed, some personal opinion/bias: It is short. It is not "string". "binary", to me, reads a little as "packed array of bits", rather than "packed array of bytes", as is intended.

Martin Stjernholm, Roxen IS ＠ Pike developers forum

12:55 p.m.

What about "bytes" or "octets"?

Johan Sundstr�m (Achtung Liebe!) ＠ Pike (-) developers forum

1:05 p.m.

Would work too, without implying purpose, as I would presume you might feel about "buffer"? (I'm not a polar negative to "binary" either, but it conveys incorrect semantics to me in a way the other three do not.)

Martin Stjernholm, Roxen IS ＠ Pike developers forum

1:15 p.m.

...

Would work too, without implying purpose, as I would presume you might feel about "buffer"?

Yes, precisely.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

2:15 p.m.

"buffer" was the name of this datatype in the MudOS dialect of LPC.

Peter Bortas ＠ Pike developers forum

5:20 p.m.

That never came up as a rationale though. "buffer" just feels right.

Martin Bähr

8:59 p.m.

doesn't feel right to me. it sounds like something for temporary storage before i send it on the way. yes, in most cases what i send should be 8bit wide, but not necessarily always.

bytes has astrange feeling because of the plural. it would be like naming a string chars, or array(int) as ints. bytearray would be better.

how about string(byte)?

byte could be a shortcut for int(0..255) as well

greetings, martin.

Johan Sundstr�m (Achtung Liebe!) ＠ Pike (-) developers forum

9:10 p.m.

Rapidly approaching http://www.todaysbigthing.com/2008/07/23 now... :)

Johan Sundstr�m (Achtung Liebe!) ＠ Pike (-) developers forum

11:45 p.m.

Unfair comparison hereby retracted. :-)

Mirar ＠ Pike developers forum

10:50 p.m.

How is the problem with old code still working if "buffer" will be used as a datatype? There's bound to be lots of

string buffer;

around.

Stephen R. van den Berg

11:32 p.m.

Mirar @ Pike developers forum wrote:

...

How is the problem with old code still working if "buffer" will be used as a datatype? There's bound to be lots of

...

string buffer;

Seems like a minor disaster.

string(byte) doesn't sound so bad. bytearray or bytestring would be good runners up, IMO.

-- Sincerely, Stephen R. van den Berg. "Tomorrow will be cancelled due to lack of interest."

Johan Sundstr�m (Achtung Liebe!) ＠ Pike (-) developers forum

11:45 p.m.

Ah; using a non-identifier for the type is actually a rather big plus. Whether coupled with reserving "byte" in itself for int(0..255) (thus annulling that particular feature) or not, I'm warming up to the idea too.

I was under the initial assumption that buffers (under whichever name) would be a rather common data type, so much that a long name for it would be silliness, in the order of the most basic and common building block of the funciton of the browser object model being 23 characters long (document.getElementById). That would hardly be the case, though.

Mirar ＠ Pike developers forum

8 Aug 8 Aug

2:10 a.m.

Well, I don't know if types actually require keywords anymore. If they do, it's a minor disaster. If they don't, there's no problem - unless you want to use "string buffer;" and "buffer string;" in the same scope. :)

Martin Bähr

23 Aug 23 Aug

12:21 a.m.

just stumbled over this description of the Buf type in perl 6. may be of interest to someone

http://perlcabal.org/syn/S02.html#Built-In_Data_Types

A Buf is a stringish view of an array of integers, and has no Unicode or character properties without explicit conversion to some kind of Str. (A buf is the native counterpart.) Typically it's an array of bytes serving as a buffer. Bitwise operations on a Buf treat the entire buffer as a single large integer. Bitwise operations on a Str generally fail unless the Str in question can provide an abstract Buf interface somehow. Coercion to Buf should generally invalidate the Str interface. As a generic type Buf may be instantiated as (or bound to) any of buf8, buf16, or buf32 (or to any type that provides the appropriate Buf interface), but when used to create a buffer Buf defaults to buf8.

Unlike Str types, Buf types prefer to deal with integer string positions, and map these directly to the underlying compact array as indices. That is, these are not necessarily byte positions--an integer position just counts over the number of underlying positions, where one position means one cell of the underlying integer type. Builtin string operations on Buf types return integers and expect integers when dealing with positions. As a limiting case, buf8 is just an old-school byte string, and the positions are byte positions. Note, though, that if you remap a section of buf32 memory to be buf8, you'll have to multiply all your positions by 4.

greetings, martin.

6169

Age (days ago)

6185

Last active (days ago)

pike-devel@lists.lysator.liu.se

23 comments

8 participants

tags (0)

participants (8)

Johan Sundstr�m (Achtung Liebe!) ＠ Pike (-) developers forum
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum
Martin Bähr
Martin Nilsson (Opera Mini - AFK!) ＠ Pike (-) developers forum
Martin Stjernholm, Roxen IS ＠ Pike developers forum
Mirar ＠ Pike developers forum
Peter Bortas ＠ Pike developers forum
Stephen R. van den Berg