Support for array(string|array(int))*string

List overview All Threads
Download

newer

older

Re: FakeFile() and to assign() or...

AES pike 7.4 example

Stephen R. van den Berg

6 Feb 2009 6 Feb '09

12:45 a.m.

Pike 7.8 currently does:

...

({"abc",({65,65,66}),"def"})*"x";

(1) Result: "abcxdef"

I have created a patch that makes it do:

...

({"abc",({65,65,66}),"def"})*"x";

(1) Result: "abcxAABxdef"

instead, and which also throws an error if any other types are inside the array (the old code merely skipped them).

I'd consider the second behaviour (after my patch) more desirable and in light of actually throwing errors on unsupported types, it is really is a bugfix.

Any objections to this behaviour?

-- Sincerely, Stephen R. van den Berg. Auto repair rates: basic labor $40/hour; if you wait, $60; if you watch, $80; if you ask questions, $100; if you help, $120; if you laugh, $140.

Show replies by date

Martin Stjernholm, Roxen IS ＠ Pike developers forum

8 Feb 8 Feb

1:30 p.m.

Just make sure it continues to work with zero elements:

...

({"a", 0, "b"}) * ",";

(1) Result: "a,b"

There's code that counts on this. It's quirky, but convenient on occasion.

...

I have created a patch that makes it do:

...
({"abc",({65,65,66}),"def"})*"x";

(1) Result: "abcxAABxdef"

To me it also seems a bit quirky that there would be an implicit cast from array(int) to string in this particular situation, when the normal approach is to cast only explicitly. But I don't really mind that much.

Peter Bortas ＠ Pike developers forum

9 Feb 9 Feb

9:25 a.m.

It's so quirky that we might want to consider a warning for that.

Stephen R. van den Berg

3:07 p.m.

Peter Bortas @ Pike developers forum wrote:

...

It's so quirky that we might want to consider a warning for that.

What about this:

(string)({65,66,"abc",67,"def",68})

Can we turn it into: "ABabcCdefD" ? Currently this doesn't work (obviously). The reason I would like to have this is that in order to parse strings and accumulate results, in Pike I tend to either pick the characters apart as integers, or use strings of characters whenever the parsing allows me to go faster.

The result is that I would like to accumulate the results piecemeal in an array. This array gets filled like in the example above. Then I'd like to return a single string to the function caller, in which case I could cast it to string (as done above).

...

From a language purity standpoint I realise that implementing the cast like

this is not quite proper. Then again, I can't think of any other reasonable result the cast should have (except throw an error). Thus in essence it's a DWIM (Do What I Mean) behaviour and conforms to the principle of least surprise.

Comments?

-- Sincerely, Stephen R. van den Berg. "I don't have to take this abuse from you -- I've got hundreds of people waiting to abuse me."

Per Hedbor () ＠ Pike (-) developers forum

3:15 p.m.

How about having a specific function for that instead of (string)?

The original magic '*' operator can be implemented using

array magic = ({"abc",({65,65,66}),"def"});

((array(string))magic)*"x"

without special code in '*', btw.

Stephen R. van den Berg

3:36 p.m.

Per Hedbor () @ Pike (-) developers forum wrote:

...

How about having a specific function for that instead of (string)?

...

From a performance standpoint, that shouldn't be a problem.

Any suggestions as to the name of the function?

I'd say something like:

gather(({"abc",({65,65,66}),"def",65})) => "abcAABdefB"

...

The original magic '*' operator can be implemented using

...

array magic = ({"abc",({65,65,66}),"def"});

...

((array(string))magic)*"x"

Yes, but that kills performance, since it creates a (at least one) temporary string (per subarray) in the process.

...

without special code in '*', btw.

If gather() (as above) is allowed in, then I can take out the changes to "*" (except perhaps the tighter error checks which seem prudent).

-- Sincerely, Stephen R. van den Berg. "I don't have to take this abuse from you -- I've got hundreds of people waiting to abuse me."

Marc Dirix

3:47 p.m.

...

gather(({"abc",({65,65,66}),"def",65})) => "abcAABdefB"

Maybe a more general "to_string()", which is IIRC also the naming inside some modules to output contents to stringform? which takes array(int), array(int|string) and also int and ...?

Marc

Stephen R. van den Berg

3:58 p.m.

Marc Dirix wrote:

...

...
gather(({"abc",({65,65,66}),"def",65})) => "abcAABdefB"

...

Maybe a more general "to_string()", which is IIRC also the naming inside some modules to output contents to stringform?

Well, actually, that would be confusing, since a generic array_to_string() or to_string() raises the question why it isn't equivalent to a mere cast to string, which would be the "natural" way of doing this. So in order to set it apart from the mere cast, the name should be more than a mere "to_string".

-- Sincerely, Stephen R. van den Berg. "I don't have to take this abuse from you -- I've got hundreds of people waiting to abuse me."

Mirar ＠ Pike developers forum

3:55 p.m.

String.collect String.gather

maybe? (I haven't checked if they are in use.)

Marc Dirix

3:57 p.m.

Mirar @ Pike developers forum wrote:

...

String.collect String.gather

maybe? (I haven't checked if they are in use.)

There already exists String.int2char(), maybe expand it to enable eating array(int|string) ?

Mirar ＠ Pike developers forum

4 p.m.

Nah, rather

String.array2string

then.

Stephen R. van den Berg

4 p.m.

Mirar @ Pike developers forum wrote:

...

String.collect String.gather

...

maybe? (I haven't checked if they are in use.)

I'd have a slight preference for String.gather then. Any other votes?

-- Sincerely, Stephen R. van den Berg. "I don't have to take this abuse from you -- I've got hundreds of people waiting to abuse me."

Peter Bortas ＠ Pike developers forum

4:20 p.m.

-2

Stephen R. van den Berg

5:41 p.m.

Peter Bortas @ Pike developers forum wrote:

...

-2

I gather (pun intended) that is a strong no vote, any alternate ideas on how to implement the desired functionality? Or preferably no implementation at all?

-- Sincerely, Stephen R. van den Berg. "I don't have to take this abuse from you -- I've got hundreds of people waiting to abuse me."

Peter Bortas

10:09 p.m.

On Mon, Feb 9, 2009 at 6:41 PM, Stephen R. van den Berg srb@cuci.nl wrote:

...

Peter Bortas @ Pike developers forum wrote:

...
-2

I gather (pun intended) that is a strong no vote, any alternate ideas on how to implement the desired functionality? Or preferably no implementation at all?

It was a -1 for each name. None of them seem to be obvious pionters to what they are supposed to do.

-- Peter

Martin Stjernholm, Roxen IS ＠ Pike developers forum

10:40 p.m.

...

The reason I would like to have this is that in order to parse strings and accumulate results, in Pike I tend to either pick the characters apart as integers, or use strings of characters whenever the parsing allows me to go faster.

Hmm, I almost never pick strings apart into individual characters since I believe it's slow (although I really haven't measured). I usually manage to do it in ways where the individual char picking is kept at the C level (sscanf ftw).

...

The result is that I would like to accumulate the results piecemeal in an array. This array gets filled like in the example above. Then I'd like to return a single string to the function caller, in which case I could cast it to string (as done above).

And you can't use String.Buffer for it? It has add() for strings and putchar() for chars.

Stephen R. van den Berg

10 Feb 10 Feb

7:33 a.m.

Martin Stjernholm, Roxen IS @ Pike developers forum wrote:

...

Hmm, I almost never pick strings apart into individual characters since I believe it's slow (although I really haven't measured). I usually manage to do it in ways where the individual char picking is kept at the C level (sscanf ftw).

When parsing structures which can extend over newlines and/or support wierd quoting rules (like csv), it's almost inevitable.

...

...
The result is that I would like to accumulate the results piecemeal in an array. This array gets filled like in the example above. Then I'd like to return a single string to the function caller, in which case I could cast it to string (as done above).

...

And you can't use String.Buffer for it? It has add() for strings and putchar() for chars.

I didn't know String.Buffer existed. Looks interesting (and useful).

-- Sincerely, Stephen R. van den Berg. Several ways to kill a programmer: kill -15, fair trial; kill -1, death by hanging; kill -2, suicide; kill -3, euthanasia; kill -9, execution.

Martin Stjernholm, Roxen IS ＠ Pike developers forum

11 Feb 11 Feb

4:20 p.m.

...

When parsing structures which can extend over newlines and/or support wierd quoting rules (like csv), it's almost inevitable.

When I need to parse, say, backslash escapes (and sscanf's %O doesn't do the right thing), I start by splitting on \ and then investigate the pieces. Or alternatively I use sscanf in more or less the same way, i.e. scans data up to a character, do stuff with it, then loop. E.g:

String.Buffer out = String.Buffer(); while (1) { int res = sscanf (in, "%[^\]\%c%s", string pre, int esc, in); out->add (pre); if (res != 3) break; switch (esc) { case '\n': case '"': out->putchar (esc); break; // ... } }

I almost always use tricks like that to avoid single stepping chars on the pike level. Even though the loop above actually is O(n^2) I believe it's faster on moderately small strings.

But then again, I also got integer chars in my example above, so maybe we're talking about the same approach afterall.

Jonas Walld�n ＠ Pike developers forum

4:35 p.m.

That code will copy and rehash data in your "in" string every iteration. At some point I implemented %*4711s in sscanf() just to be able to avoid that bottleneck and it gave significant savings in my test case (the XML markup repair code in Roxen CMS).

Martin Stjernholm, Roxen IS ＠ Pike developers forum

5:05 p.m.

I know, that's why it's O(n^2). As I said, even so.. But if the input is long then an approach using splitting ought to be used instead.

It'd be nice to have something like String.Buffer for parsing too. Maybe it could be extended with functions like get_char, has_prefix, get_prefix, sscanf, etc.

Stephen R. van den Berg

7:32 p.m.

Jonas Walld?n @ Pike developers forum wrote:

...

iteration. At some point I implemented %*4711s in sscanf() just to be

I'll bite: what does %*4711s do in sscanf()?

-- Sincerely, Stephen R. van den Berg. I'm sorry. The number you have reached is imaginary. Please rotate your phone 90 degrees and try again.

Mirar ＠ Pike developers forum

7:55 p.m.

Same as %4711s, but doesn't write it down in the argument list.

Jonas Walld�n ＠ Pike developers forum

10:10 p.m.

Exactly, it allows you to start sscanf() at a known offset in your input string without creating intermediate strings like data[pos..] would do.

Mirar ＠ Pike developers forum

10:35 p.m.

It's a pity you'd have to put "4711" in your format string though...

Jonas Walld�n ＠ Pike developers forum

10:40 p.m.

Well, not too bad:

sscanf(data, "%*" + pos + "s...", ...);

It's not impossible to think of an optimization that uses that format when the programmer writes

sscanf(data[pos..], "...", ...);

as long as the sscanf() return value is adjusted. Maybe Grubba fixed that already?

Martin Stjernholm, Roxen IS ＠ Pike developers forum

11:15 p.m.

Hmm, noticed this is already checked in.

Well, in any case, let me remind that the refdoc for `* still isn't updated. It's in operators.c.

Stephen R. van den Berg

12 Feb 12 Feb

12:33 a.m.

Martin Stjernholm, Roxen IS @ Pike developers forum wrote:

...

Hmm, noticed this is already checked in.

...

Well, in any case, let me remind that the refdoc for `* still isn't updated. It's in operators.c.

Well, I'm going to take it out again, please wait till tomorrow. Taking it out will leave the tighter error checks.

-- Sincerely, Stephen R. van den Berg. I'm sorry. The number you have reached is imaginary. Please rotate your phone 90 degrees and try again.

5972

Age (days ago)

5978

Last active (days ago)

pike-devel@lists.lysator.liu.se

26 comments

8 participants

tags (0)

participants (8)

Jonas Walld�n ＠ Pike developers forum
Marc Dirix
Martin Stjernholm, Roxen IS ＠ Pike developers forum
Mirar ＠ Pike developers forum
Per Hedbor () ＠ Pike (-) developers forum
Peter Bortas
Peter Bortas ＠ Pike developers forum
Stephen R. van den Berg