Pike 7.8 currently does:
({"abc",({65,65,66}),"def"})*"x";
(1) Result: "abcxdef"
I have created a patch that makes it do:
({"abc",({65,65,66}),"def"})*"x";
(1) Result: "abcxAABxdef"
instead, and which also throws an error if any other types are inside the array (the old code merely skipped them).
I'd consider the second behaviour (after my patch) more desirable and in light of actually throwing errors on unsupported types, it is really is a bugfix.
Any objections to this behaviour?
Just make sure it continues to work with zero elements:
({"a", 0, "b"}) * ",";
(1) Result: "a,b"
There's code that counts on this. It's quirky, but convenient on occasion.
I have created a patch that makes it do:
({"abc",({65,65,66}),"def"})*"x";
(1) Result: "abcxAABxdef"
To me it also seems a bit quirky that there would be an implicit cast from array(int) to string in this particular situation, when the normal approach is to cast only explicitly. But I don't really mind that much.
Peter Bortas @ Pike developers forum wrote:
It's so quirky that we might want to consider a warning for that.
What about this:
(string)({65,66,"abc",67,"def",68})
Can we turn it into: "ABabcCdefD" ? Currently this doesn't work (obviously). The reason I would like to have this is that in order to parse strings and accumulate results, in Pike I tend to either pick the characters apart as integers, or use strings of characters whenever the parsing allows me to go faster.
The result is that I would like to accumulate the results piecemeal in an array. This array gets filled like in the example above. Then I'd like to return a single string to the function caller, in which case I could cast it to string (as done above).
From a language purity standpoint I realise that implementing the cast like
this is not quite proper. Then again, I can't think of any other reasonable result the cast should have (except throw an error). Thus in essence it's a DWIM (Do What I Mean) behaviour and conforms to the principle of least surprise.
Comments?
How about having a specific function for that instead of (string)?
The original magic '*' operator can be implemented using
array magic = ({"abc",({65,65,66}),"def"});
((array(string))magic)*"x"
without special code in '*', btw.
Per Hedbor () @ Pike (-) developers forum wrote:
How about having a specific function for that instead of (string)?
From a performance standpoint, that shouldn't be a problem.
Any suggestions as to the name of the function?
I'd say something like:
gather(({"abc",({65,65,66}),"def",65})) => "abcAABdefB"
The original magic '*' operator can be implemented using
array magic = ({"abc",({65,65,66}),"def"});
((array(string))magic)*"x"
Yes, but that kills performance, since it creates a (at least one) temporary string (per subarray) in the process.
without special code in '*', btw.
If gather() (as above) is allowed in, then I can take out the changes to "*" (except perhaps the tighter error checks which seem prudent).
Marc Dirix wrote:
gather(({"abc",({65,65,66}),"def",65})) => "abcAABdefB"
Maybe a more general "to_string()", which is IIRC also the naming inside some modules to output contents to stringform?
Well, actually, that would be confusing, since a generic array_to_string() or to_string() raises the question why it isn't equivalent to a mere cast to string, which would be the "natural" way of doing this. So in order to set it apart from the mere cast, the name should be more than a mere "to_string".
Mirar @ Pike developers forum wrote:
String.collect String.gather
maybe? (I haven't checked if they are in use.)
I'd have a slight preference for String.gather then. Any other votes?
Peter Bortas @ Pike developers forum wrote:
-2
I gather (pun intended) that is a strong no vote, any alternate ideas on how to implement the desired functionality? Or preferably no implementation at all?
On Mon, Feb 9, 2009 at 6:41 PM, Stephen R. van den Berg srb@cuci.nl wrote:
Peter Bortas @ Pike developers forum wrote:
-2
I gather (pun intended) that is a strong no vote, any alternate ideas on how to implement the desired functionality? Or preferably no implementation at all?
It was a -1 for each name. None of them seem to be obvious pionters to what they are supposed to do.
The reason I would like to have this is that in order to parse strings and accumulate results, in Pike I tend to either pick the characters apart as integers, or use strings of characters whenever the parsing allows me to go faster.
Hmm, I almost never pick strings apart into individual characters since I believe it's slow (although I really haven't measured). I usually manage to do it in ways where the individual char picking is kept at the C level (sscanf ftw).
The result is that I would like to accumulate the results piecemeal in an array. This array gets filled like in the example above. Then I'd like to return a single string to the function caller, in which case I could cast it to string (as done above).
And you can't use String.Buffer for it? It has add() for strings and putchar() for chars.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Hmm, I almost never pick strings apart into individual characters since I believe it's slow (although I really haven't measured). I usually manage to do it in ways where the individual char picking is kept at the C level (sscanf ftw).
When parsing structures which can extend over newlines and/or support wierd quoting rules (like csv), it's almost inevitable.
The result is that I would like to accumulate the results piecemeal in an array. This array gets filled like in the example above. Then I'd like to return a single string to the function caller, in which case I could cast it to string (as done above).
And you can't use String.Buffer for it? It has add() for strings and putchar() for chars.
I didn't know String.Buffer existed. Looks interesting (and useful).
When parsing structures which can extend over newlines and/or support wierd quoting rules (like csv), it's almost inevitable.
When I need to parse, say, backslash escapes (and sscanf's %O doesn't do the right thing), I start by splitting on \ and then investigate the pieces. Or alternatively I use sscanf in more or less the same way, i.e. scans data up to a character, do stuff with it, then loop. E.g:
String.Buffer out = String.Buffer(); while (1) { int res = sscanf (in, "%[^\]\%c%s", string pre, int esc, in); out->add (pre); if (res != 3) break; switch (esc) { case '\n': case '"': out->putchar (esc); break; // ... } }
I almost always use tricks like that to avoid single stepping chars on the pike level. Even though the loop above actually is O(n^2) I believe it's faster on moderately small strings.
But then again, I also got integer chars in my example above, so maybe we're talking about the same approach afterall.
That code will copy and rehash data in your "in" string every iteration. At some point I implemented %*4711s in sscanf() just to be able to avoid that bottleneck and it gave significant savings in my test case (the XML markup repair code in Roxen CMS).
I know, that's why it's O(n^2). As I said, even so.. But if the input is long then an approach using splitting ought to be used instead.
It'd be nice to have something like String.Buffer for parsing too. Maybe it could be extended with functions like get_char, has_prefix, get_prefix, sscanf, etc.
Jonas Walld?n @ Pike developers forum wrote:
iteration. At some point I implemented %*4711s in sscanf() just to be
I'll bite: what does %*4711s do in sscanf()?
Exactly, it allows you to start sscanf() at a known offset in your input string without creating intermediate strings like data[pos..] would do.
Well, not too bad:
sscanf(data, "%*" + pos + "s...", ...);
It's not impossible to think of an optimization that uses that format when the programmer writes
sscanf(data[pos..], "...", ...);
as long as the sscanf() return value is adjusted. Maybe Grubba fixed that already?
Hmm, noticed this is already checked in.
Well, in any case, let me remind that the refdoc for `* still isn't updated. It's in operators.c.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Hmm, noticed this is already checked in.
Well, in any case, let me remind that the refdoc for `* still isn't updated. It's in operators.c.
Well, I'm going to take it out again, please wait till tomorrow. Taking it out will leave the tighter error checks.
pike-devel@lists.lysator.liu.se