Re: bug: casts to string for long double and long long int are incorrect (7.4.28 rel)

List overview All Threads
Download

newer

older

utf8_char_index

Where's the extra reference.

Mirar ＠ Pike developers forum

20 Sep 2003 20 Sep '03

3:25 p.m.

I'm going to create the module _PCRE, so it can configure and load nicely without interference from the main Regexp module, and let it be a rather raw glue for a better glue in Regexp.PCRE.

Show replies by date

Martin Nilsson (saturator) ＠ Pike (-) developers forum

20 Sep 20 Sep

3:30 p.m.

New subject: bug: casts to string for long double and long long int are incorrect (7.4.28 rel)

I think it is nicer to put PCRE right in the Regexp module. Someday someone with a big brain has to come up with a way to do away with all _-modules.

/ Martin Nilsson (saturator)

Previous text:

...

2003-09-20 16:22: Subject: Re: bug: casts to string for long double and long long int are incorrect (7.4.28 rel)

I'm going to create the module _PCRE, so it can configure and load nicely without interference from the main Regexp module, and let it be a rather raw glue for a better glue in Regexp.PCRE.

/ Mirar

Mirar ＠ Pike developers forum

3:30 p.m.

New subject: bug: casts to string for long double and long long int are incorrect (7.4.28 rel)

I think it would be nice to allow Regexp to be loaded on a system without pcre installed even if you get a binary dist of Pike...

...so until someone thinks of a way to get rid of the _-stuff... :)

/ Mirar

Previous text:

...

2003-09-20 16:26: Subject: Re: bug: casts to string for long double and long long int are incorrect (7.4.28 rel)

I think it is nicer to put PCRE right in the Regexp module. Someday someone with a big brain has to come up with a way to do away with all _-modules.

/ Martin Nilsson (saturator)

Martin Nilsson (saturator) ＠ Pike (-) developers forum

3:35 p.m.

New subject: bug: casts to string for long double and long long int are incorrect (7.4.28 rel)

But you don't have to put _PCRE on the top level, do you? But then you would have to fight with the makefiles I guess, so I see why that is unattractive...

/ Martin Nilsson (saturator)

Previous text:

...

2003-09-20 16:29: Subject: Re: bug: casts to string for long double and long long int are incorrect (7.4.28 rel)

I think it would be nice to allow Regexp to be loaded on a system without pcre installed even if you get a binary dist of Pike...

...so until someone thinks of a way to get rid of the _-stuff... :)

/ Mirar

Martin Stjernholm, Roxen IS ＠ Pike developers forum

3:40 p.m.

New subject: bug: casts to string for long double and long long int are incorrect (7.4.28 rel)

Just a general thought when it comes to automatic conversion between wide strings and utf-8: If there are match functions that return the position of matches or if there are primitives that only match at a specific position then we might have a problem of converting those positions. Ideally pcre deals with logical character positions instead of bytes in this case, but I don't know.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-09-20 16:22: Subject: Re: bug: casts to string for long double and long long int are incorrect (7.4.28 rel)

I'm going to create the module _PCRE, so it can configure and load nicely without interference from the main Regexp module, and let it be a rather raw glue for a better glue in Regexp.PCRE.

/ Mirar

Mirar ＠ Pike developers forum

3:40 p.m.

New subject: bug: casts to string for long double and long long int are incorrect (7.4.28 rel)

I was pondering about that. I'm going to investigate...

I was considering having several PCRE Regexp classes, one fast and one that does study, and maybe another set to do automatic widestring <-> UTF-8 conversions.

/ Mirar

Previous text:

...

2003-09-20 16:35: Subject: Re: bug: casts to string for long double and long long int are incorrect (7.4.28 rel)

Just a general thought when it comes to automatic conversion between wide strings and utf-8: If there are match functions that return the position of matches or if there are primitives that only match at a specific position then we might have a problem of converting those positions. Ideally pcre deals with logical character positions instead of bytes in this case, but I don't know.

/ Martin Stjernholm, Roxen IS

Mirar ＠ Pike developers forum

21 Sep 21 Sep

11:45 a.m.

New subject: bug: casts to string for long double and long long int are incorrect (7.4.28 rel)

As for UTF-8, here's some results:

...

object o=_Regexp_PCRE._pcre("b.d",_Regexp_PCRE.OPTION.UTF8); o->exec(string_to_utf8("\34429b\1234d\123132"));

(13) Result: ({ /* 2 elements */ 3, 7 })

...

map(_Regexp_PCRE.split_subject(string_to_utf8("\34429b\1234d\123132"),o->exec(string_to_utf8("\34429b\1234d\123132"))),utf8_to_string);

(16) Result: ({ /* 1 element */ "b\1234d" })

so it seems it gives indexes to the matching byte offsets, and not character offsets. Is there any convenience function for figuring out real character offsets from byte offsets in an utf8-encoded string?

/ Mirar

Previous text:

...

2003-09-20 16:39: Subject: Re: bug: casts to string for long double and long long int are incorrect (7.4.28 rel)

I was pondering about that. I'm going to investigate...

I was considering having several PCRE Regexp classes, one fast and one that does study, and maybe another set to do automatic widestring <-> UTF-8 conversions.

/ Mirar

Mirar ＠ Pike developers forum

12:20 p.m.

New subject: wish: string with other quoting then \

If I could wish, I think I'd wish for an alternative string syntax in Pike, that doesn't use \ to quote. It's so hairy writing advanced regexps if you have to quote the . :)

Is it a bad idea? It should be quote possible to do, if we could spare some syntax for it (and maybe figure out another quote character).

/ Mirar

Previous text:

...

2003-09-21 12:43: Subject: Re: bug: casts to string for long double and long long int are incorrect (7.4.28 rel)

As for UTF-8, here's some results:

...
object o=_Regexp_PCRE._pcre("b.d",_Regexp_PCRE.OPTION.UTF8); o->exec(string_to_utf8("\34429b\1234d\123132"));

(13) Result: ({ /* 2 elements */ 3, 7 })

...
map(_Regexp_PCRE.split_subject(string_to_utf8("\34429b\1234d\123132"),o->exec(string_to_utf8("\34429b\1234d\123132"))),utf8_to_string);

(16) Result: ({ /* 1 element */ "b\1234d" })

so it seems it gives indexes to the matching byte offsets, and not character offsets. Is there any convenience function for figuring out real character offsets from byte offsets in an utf8-encoded string?

/ Mirar

Martin Stjernholm, Roxen IS ＠ Pike developers forum

12:55 p.m.

New subject: wish: string with other quoting then \

Ah, the leaning toothpick syndrome. At least the most common special characters, such as "(", ")", "|", "[", "]", "^", and "$", don't need backslashes so it's not as bad as some other well known examples. If anything should be done about it I think it could just as well be to introduce a regexp syntax that don't use backslashes for quoting.

There was a discussion about other string syntaxes not long ago on the Pike list; search for the subject "multi line strings".

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-09-21 13:15: Subject: wish: string with other quoting then \

If I could wish, I think I'd wish for an alternative string syntax in Pike, that doesn't use \ to quote. It's so hairy writing advanced regexps if you have to quote the . :)

Is it a bad idea? It should be quote possible to do, if we could spare some syntax for it (and maybe figure out another quote character).

/ Mirar

Mirar ＠ Pike developers forum

1 p.m.

New subject: wish: string with other quoting then \

Yes, I remember that discussion. I think it's a problem that could do with a good solution in Pike; but I *would* like to see a string syntax that solves three problems:

1) doesn't use \ for in-string quoting 2) doesn't need " or ' quoted nor any other regular regexp character 3) doesn't need newline quoted and accepts newline in string

These problems are common when using regexps and writing HTML or Pike-writing programs.

If it should be usable for regexps, it should have a fairly short syntax (for instance two leading, two exiting characters), but to be really useful it could be nice with an optional exit string.

I don't think anyone would really like to use a regexp syntax without the 's; it's too widely used and it takes too much energy to relearn.

/ Mirar

Previous text:

...

2003-09-21 13:52: Subject: wish: string with other quoting then \

Ah, the leaning toothpick syndrome. At least the most common special characters, such as "(", ")", "|", "[", "]", "^", and "$", don't need backslashes so it's not as bad as some other well known examples. If anything should be done about it I think it could just as well be to introduce a regexp syntax that don't use backslashes for quoting.

There was a discussion about other string syntaxes not long ago on the Pike list; search for the subject "multi line strings".

/ Martin Stjernholm, Roxen IS

Mirar ＠ Pike developers forum

1:05 p.m.

New subject: wish: string with other quoting then \

Maybe like this, for instance?

««string»» as in ««^(.*)[a-z]*$»»

with the optional

«blockstring«string»blockstring»

as in

«FOO« int main() { werror(««hello, world!»»"\n"); } »FOO»

...mind that it's probably not a good idea to use 8-bit characters; those above are just an example. Even if I don't have a problem with it, most developers have a hard time typing in « and », so some effort should be put on replacing those with some ASCII solution.

/ Mirar

Previous text:

...

2003-09-21 13:57: Subject: wish: string with other quoting then \

Yes, I remember that discussion. I think it's a problem that could do with a good solution in Pike; but I *would* like to see a string syntax that solves three problems:

doesn't use \ for in-string quoting

doesn't need " or ' quoted nor any other regular regexp character

doesn't need newline quoted and accepts newline in string

These problems are common when using regexps and writing HTML or Pike-writing programs.

If it should be usable for regexps, it should have a fairly short syntax (for instance two leading, two exiting characters), but to be really useful it could be nice with an optional exit string.

I don't think anyone would really like to use a regexp syntax without the 's; it's too widely used and it takes too much energy to relearn.

/ Mirar

David Hedbor ＠ Pike developers forum

22 Sep 22 Sep

7:55 p.m.

New subject: wish: string with other quoting then \

The main problem with « and » for me is that I need to use Compose + shift [<,] + shift [<,] Compose + shift [>.] + shift [/.]

([>.] and [<,] are the keys with these two chars on them)

That's kind of a pain if it was used for normal coding. I vote for this syntax:

write(«-· Hello world! ·-»);

That looks all nice and pretty! And no, I'm obviously not really serious, but I can't think of any good options really. Of course, the '"""' type syntax really isn't such a bad idea, except perhaps in terms of actually parsing it correctly (backwards compatibility with odd code like write("""hello") for example).

write("""This is my funny \ piece of "test text" to show the "useful "syntax"" or so.""");

/ David Hedbor

Previous text:

...

2003-09-21 14:02: Subject: wish: string with other quoting then \

Maybe like this, for instance?

««string»» as in ««^(.*)[a-z]*$»»

with the optional

«blockstring«string»blockstring»

as in

«FOO« int main() { werror(««hello, world!»»"\n"); } »FOO»

...mind that it's probably not a good idea to use 8-bit characters; those above are just an example. Even if I don't have a problem with it, most developers have a hard time typing in « and », so some effort should be put on replacing those with some ASCII solution.

/ Mirar

Martin Stjernholm, Roxen IS ＠ Pike developers forum

21 Sep 21 Sep

1:50 p.m.

New subject: wish: string with other quoting then \

As if it would be any less learning with a new string syntax? In either case the user has to come to terms with nonstandard (as in non-C and non-regex) syntax.

My personal belief when it comes to regexps is that it's simpler to live with than to try to improve. For very complicated regexps I think the right way is to use something completely different and only use the cryptic strings for smaller parts. E.g:

Rx.Rx syntactic_ws = Rx.Rx ( Rx.rep (Rx.or ("[ \t\n\r\f\v]", // Ordinary whitespace. "//[^\n]*\n", // Line comments. "/\*([^*]|\*[^/])*\*/" // Block comments. )));

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-09-21 13:57: Subject: wish: string with other quoting then \

Yes, I remember that discussion. I think it's a problem that could do with a good solution in Pike; but I *would* like to see a string syntax that solves three problems:

doesn't use \ for in-string quoting

doesn't need " or ' quoted nor any other regular regexp character

doesn't need newline quoted and accepts newline in string

These problems are common when using regexps and writing HTML or Pike-writing programs.

If it should be usable for regexps, it should have a fairly short syntax (for instance two leading, two exiting characters), but to be really useful it could be nice with an optional exit string.

I don't think anyone would really like to use a regexp syntax without the 's; it's too widely used and it takes too much energy to relearn.

/ Mirar

Mirar ＠ Pike developers forum

2:20 p.m.

New subject: wish: string with other quoting then \

I think a new string syntax is way easier to learn then another solution then backslashes. Also, it can solve more problem then just regexp issues.

How does Perl solve this?

For the object oriented/functional solution, I'm still waiting for your Regexp system. :)

/ Mirar

Previous text:

...

2003-09-21 14:49: Subject: wish: string with other quoting then \

As if it would be any less learning with a new string syntax? In either case the user has to come to terms with nonstandard (as in non-C and non-regex) syntax.

My personal belief when it comes to regexps is that it's simpler to live with than to try to improve. For very complicated regexps I think the right way is to use something completely different and only use the cryptic strings for smaller parts. E.g:

Rx.Rx syntactic_ws = Rx.Rx ( Rx.rep (Rx.or ("[ \t\n\r\f\v]", // Ordinary whitespace. "//[^\n]*\n", // Line comments. "/\*([^*]|\*[^/])*\*/" // Block comments. )));

/ Martin Stjernholm, Roxen IS

Martin Stjernholm, Roxen IS ＠ Pike developers forum

2:25 p.m.

New subject: wish: string with other quoting then \

Just changing the regexp quote character to something else would make a simple rule.

It'd be very simple to implement a similar object/function interface in your pcre glue. It'd just be a set of functions that internally converts to pcre regexp syntax. I can provide the design I've made for that; it's very straightforward.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-09-21 15:15: Subject: wish: string with other quoting then \

I think a new string syntax is way easier to learn then another solution then backslashes. Also, it can solve more problem then just regexp issues.

How does Perl solve this?

For the object oriented/functional solution, I'm still waiting for your Regexp system. :)

/ Mirar

Mirar ＠ Pike developers forum

2:45 p.m.

New subject: wish: string with other quoting then \

...

Just changing the regexp quote character to something else would make a simple rule.

Of course.

...

It'd be very simple to implement a similar object/function interface in your pcre glue. It'd just be a set of functions that internally converts to pcre regexp syntax. I can provide the design I've made for that; it's very straightforward.

That's true. I'm currently on the step of starting to write the Pike level glue for Regexp.PCRE... Was there a start of that somewhere? I can't seem to find it.

/ Mirar

Previous text:

...

2003-09-21 15:24: Subject: wish: string with other quoting then \

Just changing the regexp quote character to something else would make a simple rule.

It'd be very simple to implement a similar object/function interface in your pcre glue. It'd just be a set of functions that internally converts to pcre regexp syntax. I can provide the design I've made for that; it's very straightforward.

/ Martin Stjernholm, Roxen IS

Martin Stjernholm, Roxen IS ＠ Pike developers forum

3:10 p.m.

New subject: wish: string with other quoting then \

Here are the docstrings for my set of regexp operators. I doubt much else of my code would be of any use here.

//! @decl RxNode any(); //! //! Matches any symbol.

//! @decl RxNode seq (LaxRxType... regexps); //! //! A sequence. If an array is used as a sub-regexp it's converted to //! this.

//! @decl RxNode seq_or (LaxRxType... regexps); //! //! Like @[Rx.or], but keeps the order between the sub-regexps, so //! that if two or more of them match the same input, it's always //! the match in the first one that's returned first. If all //! alternative matches are requested, they're enumerated in the order //! that the sub-regexps match. //! //! @note //! This is the union variant that most closely resembles the "|" //! operator in most other regexp engines. However, in some cases it //! cannot do as good a job to determinize as @[Rx.or], so if the //! order isn't relevant, use that one instead.

//! @decl RxNode range (Symbol from, Symbol to); //! //! A range of all symbols between @[from] and @[to], inclusive.

//! @decl RxNode rep (LaxRxType regexp, void|int low, void|int high); //! //! Repetition, which can be upwardly bounded or unbounded. (In the //! unbounded forms this includes "Kleene star" and "Kleene plus".) //! The given regexp must match at least @[low] and at most @[high] //! times. @[low] defaults to zero. There's no upper bound if @[high] //! is left out or is negative. If @[high] isn't negative but less //! than @[low], this matches nothing. //! //! @note //! The first returned match is the longest possible one. Therefore //! this operator is "greedy". There's also a non-greedy variant //! @[Rx.lrep]. //! //! Actually the above is not entirely correct; the first returned //! match is really the first match of @[regexp], repeated as many //! times as possible. //! //! For example, if @[regexp] matches @tt{"aa"@} and @tt{"a"@} in that //! order, then the first match on @tt{"aaa"@} will have two //! repetitions where @[regexp] matched @tt{"aa"@} and then @tt{"a"@}, //! and not three repetitions where each matched @tt{"a"@}. //! //! Otoh, if @[regexp] is lazy and matches @tt{"a"@} before //! @tt{"aa"@}, and if the repetition is upwardly bounded to two //! repetitions, then the first match on @tt{"aaa"@} will be two //! repetitions where each matched @tt{"a"@}. I.e. the first match is //! not the longest possible one.

//! @decl RxNode lrep (LaxRxType regexp, void|int low, void|int high); //! //! Like @[Rx.rep], but implements laziness: The first returned match //! repeats the regexp as few times as possible within the limits, //! whereas @[Rx.rep] repeats it as many times as possible. //! //! @note //! The first returned match is actually the first match of @[regexp], //! repeated as few times as possible. //! //! For example, if @[regexp] matches @tt{"a"@} and @tt{"aa"@} in that //! order, then the first match on @tt{"aaa"@} will have three //! repetitions where each @[regexp] matched @tt{"a"@}, and not two //! repetitions where one of them matched @tt{"aa"@}. //! //! Otoh, if @[regexp] is greedy and matches @tt{"aa"@} before //! @tt{"a"@}, and if the repetition must match at least once, then //! the first match on @tt{"aaa"@} will be one repetition where //! @[regexp] matched @tt{"aa"@} and not @tt{"a"@}. I.e. the first //! match is not the shortest possible one.

//! @decl RxNode opt (LaxRxType regexp); //! //! Match the regexp optionally, i.e. like //! @tt{@[Rx.rep] (@[regexp], 0, 1)@}. //! //! @note //! In the case where it's possible to both match the regexp and not //! match it, the first returned match will be with the regexp. I.e. //! this operator is "greedy" just like @[Rx.rep]. There's also a //! non-greedy variant @[Rx.lopt].

//! @decl RxNode lopt (LaxRxType regexp); //! //! Match the regexp optionally and lazily, i.e. like //! @tt{@[Rx.lrep] (@[regexp], 0, 1)@}. So whenever it's possible to //! not match the regexp, the first returned match won't match it.

//! @decl RxNode str (string literal); //! //! A literal string. If a string is used as a sub-regexp, it's //! converted to this. Technically this is a syntax parser that treats //! its whole input as a literal.

//! @decl RxNode set_str (string chars); //! //! A set of symbols parsed from a string.

//! @decl RxNode save (LaxRxType regexp, void|string name); //! //! Saves the match of @[regexp] for later retrieval. If @[name] is //! given, it's used as a name to identify the saved submatch, //! otherwise it's accessed by position. //! //! The position is determined by counting the start of each unnamed //! submatch as they are encountered from left to right, beginning at //! zero. Note that this might not be well defined if e.g. @tt{(< >)@} //! or @tt{([ ])@} is used to build the regexp tree. //! //! If @[regexp] matches several times (typically when used inside a //! repetition) every match overwrites the preceding one, so only the //! last match is available afterwards.

//! @decl RxNode saveall (LaxRxType regexp, void|string name); //! //! Like @[Rx.save], but if @[regexp] matches several times (typically //! when used inside a repetition) then all those matches are saved. //! The saved value is an array of the matches, in the order they are //! found.

To put the operators above in some perspective, here are the others that I think would be a bit difficult to include in the pcre glue:

//! @decl RxNode sym (Symbol... symbols); //! //! A sequence of symbols. The difference from @[Rx.seq] is that the //! elements are treated as literal symbols and not regexps. This is //! only necessary when the symbols are of a type that otherwise would //! be interpreted as something else, e.g. strings.

//! @decl RxNode pair (Symbol from, Symbol to); //! //! The pair @tt{@[from]/@[to]@}, where the symbol @[from] in the //! input is mapped to @[to] in the output. The result is thus a //! transducer.

//! @decl RxNode or (LaxRxType... regexps); //! //! A union; matches if any of the arguments match. If a multiset is //! used as a sub-regexp it's converted to this. //! //! @note //! When given no arguments, this doesn't match anything at all. //! //! @note //! This operator tries to get as good determinization as possible by //! allowing any match order between the alternatives. It's therefore //! effectively "greedy" to the extent that determinization succeeds, //! but that can't be counted on since determinization isn't //! guaranteed to be complete. There's also the @[Rx.seq_or] variant //! that always matches the alternatives in the order they are given //! (which most closely resembles the behavior in other common regexp //! engines).

//! @decl RxNode and (LaxRxType... regexps); //! //! Intersection; matches only when all the arguments match.

//! @decl RxNode neg (LaxRxType regexp) //! //! Negation; matches everything that @[regexp] doesn't match.

//! @decl RxNode sub (LaxRxType a, LaxRxType b); //! //! Subtraction; matches when @[a] but not @[b] matches.

//! @decl RxNode set (Symbol... symbols); //! //! A set of symbols. Much like @[Rx.or], but the elements are treated //! as literal symbols and not regexps.

//! @decl RxNode map (LaxRxType from, LaxRxType to); //! //! Maps the regexp @[from] to the regexp @[to]. Both must be //! recognizers and the result is a transducer. If a mapping with a //! single element is used as a sub-regexp, it's converted to this (a //! mapping with more elements becomes the union of the pairs in //! it). //! //! (Technically, this is the cross product of @[from] and @[to], i.e. //! the set of string pairs @tt{a/b@}, where @tt{a@} matches @[from] //! and @tt{b@} matches @[to].)

//! @decl RxNode test (function(DataList,void|Rx.Rx.Process:int) func, @ //! void|int low, void|int high) //! @decl RxNode test (function(DataList,void|Rx.Rx.Process:int) func, @ //! LaxRxType regexp) //! //! Calls @[func] to test whether there's a match at this position. //! //! The function will be called with a piece of the input and should //! return nonzero if the whole piece matches, zero otherwise. The //! second argument to the function is the current @[Rx.Rx.Process] //! object. Although it can't be used to reliably look at the input it //! might be useful to look at flags, e.g. @[Rx.Rx.Process.DEBUG_LOG]. //! //! If @[low] and/or @[high] is given, they give the lower and upper //! limit of the length of the string that can possibly be matched by //! @[func]. @[low] defaults to zero. There's no upper bound if //! @[high] is left out or is negative. //! //! If @[regexp] is given, only input which it matches will be tested //! with @[func]. //! //! @note //! If the possible matches aren't screened with @[regexp] or a narrow //! @[low]/@[high] interval, it's likely that the test function is //! called excessively often.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-09-21 15:43: Subject: wish: string with other quoting then \

...
Just changing the regexp quote character to something else would make a simple rule.

Of course.

...
It'd be very simple to implement a similar object/function interface in your pcre glue. It'd just be a set of functions that internally converts to pcre regexp syntax. I can provide the design I've made for that; it's very straightforward.

That's true. I'm currently on the step of starting to write the Pike level glue for Regexp.PCRE... Was there a start of that somewhere? I can't seem to find it.

/ Mirar

Andreas Lange (tv�ngshemsidad) ＠ Pike (-) developers forum

11:20 p.m.

New subject: wish: string with other quoting then \

...

How does Perl solve this?

In many ways, as usual.

Non interpolated string - single quotes: 'string' Interpolated string - double quotes: "value: $var" Funcional equivalence is q(string) and qq(string). Parenthesis may be changed to brackets, curls... Pipe is handy for html, like in qq|<body onload="func()">|. All of these may be multiline.

And as mentioned earlier: $txt = <<MARK; ... ... ... MARK

A matching regexp is usually written as /.../. The explicit form is m/.../ where slashes could be replaced to prevent the LTS (eg m|/path/| or m,xxx,). Even m#...# works, but tends to upset syntax highlighters since '#' denotes a comment in Perl.

/ Andreas Lange (tvångshemsidad)

Previous text:

...

2003-09-21 15:15: Subject: wish: string with other quoting then \

I think a new string syntax is way easier to learn then another solution then backslashes. Also, it can solve more problem then just regexp issues.

How does Perl solve this?

For the object oriented/functional solution, I'm still waiting for your Regexp system. :)

/ Mirar

Fredrik (Naranek) Hubinette (Real Build Master) ＠ Pike (-) developers forum

22 Sep 22 Sep

9:55 p.m.

New subject: wish: string with other quoting then \

Why not use operator overloading?

( Rx.Rx("[ \t\n\r\f\v]") | Rx.Rx("//[^\n]*\n") | Rx.Rx("/\*([^*]|\*[^/])*\*/") ) * Rx.inf

Btw, I tried doing this to emulate pipes recently, and found out that I can't make an `< operator which returns an object.. Kind of annoying as I wasn't planning to use `< for comparisons.

I was trying to do something like:

( Cmd("grep foo") < Stdio.File("foo.txt") | Cmd("uniq") > mysocket ) ->run();

Any ideas for how to get around the type restrictions in `< and other lfuns?

/ Fredrik (Naranek) Hubinette (Real Build Master)

Previous text:

...

2003-09-21 14:49: Subject: wish: string with other quoting then \

As if it would be any less learning with a new string syntax? In either case the user has to come to terms with nonstandard (as in non-C and non-regex) syntax.

My personal belief when it comes to regexps is that it's simpler to live with than to try to improve. For very complicated regexps I think the right way is to use something completely different and only use the cryptic strings for smaller parts. E.g:

Rx.Rx syntactic_ws = Rx.Rx ( Rx.rep (Rx.or ("[ \t\n\r\f\v]", // Ordinary whitespace. "//[^\n]*\n", // Line comments. "/\*([^*]|\*[^/])*\*/" // Block comments. )));

/ Martin Stjernholm, Roxen IS

David Hedbor ＠ Pike developers forum

10 p.m.

New subject: wish: string with other quoting then \

...

( Cmd("grep foo") < Stdio.File("foo.txt") | Cmd("uniq") > mysocket ) ->run();

Oh, this is some nice looking syntax. Or, if you wish, it makes Pike look like a shell script. It would probably make some external command calling easier however, and it wouldn't require using actual shell syntax.

/ David Hedbor

Previous text:

...

2003-09-22 22:53: Subject: wish: string with other quoting then \

Why not use operator overloading?

( Rx.Rx("[ \t\n\r\f\v]") | Rx.Rx("//[^\n]*\n") | Rx.Rx("/\*([^*]|\*[^/])*\*/") ) * Rx.inf

Btw, I tried doing this to emulate pipes recently, and found out that I can't make an `< operator which returns an object.. Kind of annoying as I wasn't planning to use `< for comparisons.

I was trying to do something like:

( Cmd("grep foo") < Stdio.File("foo.txt") | Cmd("uniq") > mysocket ) ->run();

Any ideas for how to get around the type restrictions in `< and other lfuns?

/ Fredrik (Naranek) Hubinette (Real Build Master)

Martin Stjernholm, Roxen IS ＠ Pike developers forum

23 Sep 23 Sep

12:15 a.m.

New subject: wish: string with other quoting then \

...

Why not use operator overloading?

( Rx.Rx("[ \t\n\r\f\v]") | Rx.Rx("//[^\n]*\n") | Rx.Rx("/\*([^*]|\*[^/])*\*/") ) * Rx.inf

Well, I think your example shows one reason: It's then necessary to introduce objects on the leaf level instead of higher up. Assuming that there are about as many leaves in the syntax tree as inner nodes, little is gained when it comes to brevity.

Also much is lost in general ease of use; the author would often have to spend time brooding on the operator bindings and the technical details on how magic objects like Rx.inf works. The reader must then do the same too. So it's essentially only unnecessarily tricky to write and unnecessarily tricky to read. Better to stick to the plain old boring easy and obvious function syntax instead.

I abhor overuse of overloaded operators in general. More often than not, it only becomes a mild form of obfuscation, like a little puzzle. It carries the same kind of fascination too: "Hey, look how odd I can make my code look!" - "Whoo, cool!" Certainly fun when making short code snippets to show off with, but not something I'd like to meet in everyday code.

Operator overloading is good when making data types that should work like the builtins in some aspects since it helps keeping a consistent interface, but that's it. It's definitely a feature that the type restrictions limit `< to comparisons like the builtin operator.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-09-22 22:53: Subject: wish: string with other quoting then \

Why not use operator overloading?

( Rx.Rx("[ \t\n\r\f\v]") | Rx.Rx("//[^\n]*\n") | Rx.Rx("/\*([^*]|\*[^/])*\*/") ) * Rx.inf

Btw, I tried doing this to emulate pipes recently, and found out that I can't make an `< operator which returns an object.. Kind of annoying as I wasn't planning to use `< for comparisons.

I was trying to do something like:

( Cmd("grep foo") < Stdio.File("foo.txt") | Cmd("uniq") > mysocket ) ->run();

Any ideas for how to get around the type restrictions in `< and other lfuns?

/ Fredrik (Naranek) Hubinette (Real Build Master)

Niels M�ller (igelkottsr�ddare) ＠ Pike (-) developers forum

22 Sep 22 Sep

10:30 a.m.

New subject: wish: string with other quoting then \

Or you figure out an alternative rexexp syntax, which doesn't use a magic ... ;-)

Just like you can confifure rexexp applications like perl and sed to use an arbitrary regexp terminator, perhaps you could configure the regexp engine to use an arbitrary escape character?

/ Niels Möller (igelkottsräddare)

Previous text:

...

2003-09-21 13:15: Subject: wish: string with other quoting then \

If I could wish, I think I'd wish for an alternative string syntax in Pike, that doesn't use \ to quote. It's so hairy writing advanced regexps if you have to quote the . :)

Is it a bad idea? It should be quote possible to do, if we could spare some syntax for it (and maybe figure out another quote character).

/ Mirar

Alexander Demenshin

1:02 p.m.

New subject: wish: string with other quoting then \

On Mon, Sep 22, 2003 at 11:30:02AM +0200, Niels M?ller (igelkottsr?ddare) @ Pike (-) developers forum wrote:

...

use an arbitrary regexp terminator, perhaps you could configure the regexp engine to use an arbitrary escape character?

I guess this is bad idea... It would break the compatibility, so REs written with this alternative escape character might not be used directly somewhere else. Mostly, REs syntax is stable now, why to break it? :)

Regards, /Al

Niels M�ller (igelkottsr�ddare) ＠ Pike (-) developers forum

1:25 p.m.

New subject: wish: string with other quoting then \

...

Mostly, REs syntax is stable now

You haven't looked at the plans for perl 6? At last, they're trying to come up with a sane syntax for rexexps.

/ Niels Möller (igelkottsräddare)

Previous text:

...

2003-09-22 14:03: Subject: Re: wish: string with other quoting then \

On Mon, Sep 22, 2003 at 11:30:02AM +0200, Niels M?ller (igelkottsr?ddare) @ Pike (-) developers forum wrote:

...
use an arbitrary regexp terminator, perhaps you could configure the regexp engine to use an arbitrary escape character?

I guess this is bad idea... It would break the compatibility, so REs written with this alternative escape character might not be used directly somewhere else. Mostly, REs syntax is stable now, why to break it? :)

Regards, /Al

/ Brevbäraren

Fredrik (Naranek) Hubinette (Real Build Master) ＠ Pike (-) developers forum

9:50 p.m.

New subject: wish: string with other quoting then \

Possibly the only reason Pike doesn't alredy have such a syntax is that I think regexps are horrible and I consistently refused to make special hacks in the lexer/parser for the purpose of making regexp usage easier.

I still think regexps are horrible, but it's not really up to me anymore. :)

/ Fredrik (Naranek) Hubinette (Real Build Master)

Previous text:

...

2003-09-21 13:15: Subject: wish: string with other quoting then \

If I could wish, I think I'd wish for an alternative string syntax in Pike, that doesn't use \ to quote. It's so hairy writing advanced regexps if you have to quote the . :)

Is it a bad idea? It should be quote possible to do, if we could spare some syntax for it (and maybe figure out another quote character).

/ Mirar

Martin Nilsson (saturator) ＠ Pike (-) developers forum

10:10 p.m.

New subject: wish: string with other quoting then \

While I agree that regexp are less usable than commonly attributed, I still thinks that #x...x where x is a special character is a rather elegant solution.

/ Martin Nilsson (saturator)

Previous text:

...

2003-09-22 22:45: Subject: wish: string with other quoting then \

Possibly the only reason Pike doesn't alredy have such a syntax is that I think regexps are horrible and I consistently refused to make special hacks in the lexer/parser for the purpose of making regexp usage easier.

I still think regexps are horrible, but it's not really up to me anymore. :)

/ Fredrik (Naranek) Hubinette (Real Build Master)

Fredrik (Naranek) Hubinette (Real Build Master) ＠ Pike (-) developers forum

23 Sep 23 Sep

11:10 p.m.

New subject: wish: string with other quoting then \

I don't think it's an elegant solution. Partially because it is hard to make a regexp that parses that syntax... (Emacs highligting is regexp-driven, is it not?) But then again, regexps are evil, so I guess I don't really care... :)

/ Fredrik (Naranek) Hubinette (Real Build Master)

Previous text:

...

2003-09-22 23:06: Subject: wish: string with other quoting then \

While I agree that regexp are less usable than commonly attributed, I still thinks that #x...x where x is a special character is a rather elegant solution.

/ Martin Nilsson (saturator)

8005

Age (days ago)

8008

Last active (days ago)

pike-devel@lists.lysator.liu.se

26 comments

8 participants

tags (0)

participants (8)

Alexander Demenshin
Andreas Lange (tv�ngshemsidad) ＠ Pike (-) developers forum
David Hedbor ＠ Pike developers forum
Fredrik (Naranek) Hubinette (Real Build Master) ＠ Pike (-) developers forum
Martin Nilsson (saturator) ＠ Pike (-) developers forum
Martin Stjernholm, Roxen IS ＠ Pike developers forum
Mirar ＠ Pike developers forum
Niels M�ller (igelkottsr�ddare) ＠ Pike (-) developers forum