Mirar @ Pike developers forum 10353@lyskom.lysator.liu.se wrote:
However, i'd like to know why while doing this: ----8<----8<----8<----8<----
object r = Regexp.PCRE.Studied("[\W]*$"); r->replace(foo, "");
---->8---->8---->8---->8---- Pike eats all my CPU and the command never finish.
Good question. The PCRE code should be easy to read though, feel free to investigate? :)
When using n* (zero or more n), Regexp.PCRE._pcre()->exec() returns an array of two identical int.
----8<----8<----8<----8<---- $ pike Pike v7.6 release 112 running Hilfe v3.5 (Incremental Pike Frontend)
Regexp.PCRE._pcre("o+")->exec("foobar",0);
(1) Result: ({ /* 2 elements */ 1, 3 })
Regexp.PCRE._pcre("o*")->exec("foobar",0);
(2) Result: ({ /* 2 elements */ 0, 0 }) ---->8---->8---->8---->8----
For each replace, Regexp.PCRE()->replace() attempts to execute the regular expression at the end of the previous hit. It uses the return from exec as a start and end offset. Since the start and end offset returned by exec() are the same, this results in a infinite loop.
----8<----8<----8<----8<---- $ pike Pike v7.6 release 112 running Hilfe v3.5 (Incremental Pike Frontend)
string foo = "foobar"; Regexp.PCRE("o+")->replace(foo,"");
({ /* 2 elements */ 1, 3 }) -1 (1) Result: "fbar"
Regexp.PCRE("o*")->replace(foo,"");
({ /* 2 elements */ 0, 0 }) ({ /* 2 elements */ 0, 0 }) ({ /* 2 elements */ 0, 0 }) ({ /* 2 elements */ 0, 0 })
... infinite loop ... ---->8---->8---->8---->8----
Please note that Regexp.PCRE()->matchall() obviously suffers from the same problem:
----8<----8<----8<----8<---- $ pike Pike v7.6 release 112 running Hilfe v3.5 (Incremental Pike Frontend)
string foo = "foobar"; Regexp.PCRE("o+")->matchall(foo, lambda(mixed s){ werror("%O\n",s); } );
({ /* 1 element */ "oo" }) (1) Result: Regexp.PCRE.StudiedWidestring("o+")
Regexp.PCRE("o*")->matchall(foo, lambda(mixed s){ werror("%O\n",s); } );
({ /* 1 element */ "" }) ({ /* 1 element */ "" })
... infinite loop ... ---->8---->8---->8---->8----
I don't know which code should be fixed for now. I don't understand the documentation for exec() when it returns an array here: http://pike.ida.liu.se/generated/manual/modref/ex/predef_3A_3A/Regexp/P CRE/_pcre/exec.html
Everybody thanks for the replies...! I also suffered from CPU eating regexp using '*', but I expected it was my own fault somehow :) But how do I go from here? There's also no trim_right (or _left) function, does anybody have something like that lying around? I guess I just have to process the string, right-to-left, char-by-char...
Greetings,
Coen
Bertrand LUPART wrote:
Mirar @ Pike developers forum 10353@lyskom.lysator.liu.se wrote:
However, i'd like to know why while doing this: ----8<----8<----8<----8<----
object r = Regexp.PCRE.Studied("[\W]*$"); r->replace(foo, "");
---->8---->8---->8---->8---- Pike eats all my CPU and the command never finish.
Good question. The PCRE code should be easy to read though, feel free to investigate? :)
When using n* (zero or more n), Regexp.PCRE._pcre()->exec() returns an array of two identical int.
----8<----8<----8<----8<---- $ pike Pike v7.6 release 112 running Hilfe v3.5 (Incremental Pike Frontend)
Regexp.PCRE._pcre("o+")->exec("foobar",0);
(1) Result: ({ /* 2 elements */ 1, 3 })
Regexp.PCRE._pcre("o*")->exec("foobar",0);
(2) Result: ({ /* 2 elements */ 0, 0 }) ---->8---->8---->8---->8----
For each replace, Regexp.PCRE()->replace() attempts to execute the regular expression at the end of the previous hit. It uses the return from exec as a start and end offset. Since the start and end offset returned by exec() are the same, this results in a infinite loop.
----8<----8<----8<----8<---- $ pike Pike v7.6 release 112 running Hilfe v3.5 (Incremental Pike Frontend)
string foo = "foobar"; Regexp.PCRE("o+")->replace(foo,"");
({ /* 2 elements */ 1, 3 }) -1 (1) Result: "fbar"
Regexp.PCRE("o*")->replace(foo,"");
({ /* 2 elements */ 0, 0 }) ({ /* 2 elements */ 0, 0 }) ({ /* 2 elements */ 0, 0 }) ({ /* 2 elements */ 0, 0 })
... infinite loop ... ---->8---->8---->8---->8----
Please note that Regexp.PCRE()->matchall() obviously suffers from the same problem:
----8<----8<----8<----8<---- $ pike Pike v7.6 release 112 running Hilfe v3.5 (Incremental Pike Frontend)
string foo = "foobar"; Regexp.PCRE("o+")->matchall(foo, lambda(mixed s){ werror("%O\n",s); } );
({ /* 1 element */ "oo" }) (1) Result: Regexp.PCRE.StudiedWidestring("o+")
Regexp.PCRE("o*")->matchall(foo, lambda(mixed s){ werror("%O\n",s); } );
({ /* 1 element */ "" }) ({ /* 1 element */ "" })
... infinite loop ... ---->8---->8---->8---->8----
I don't know which code should be fixed for now. I don't understand the documentation for exec() when it returns an array here: http://pike.ida.liu.se/generated/manual/modref/ex/predef_3A_3A/Regexp/P CRE/_pcre/exec.html
__________________________________________________________ Deze e-mail en de inhoud is vertrouwelijk en uitsluitend bestemd voor de geadresseerde(n). Indien u niet de geadresseerde bent van deze e-mail verzoeken wij u dit direct door te geven aan de verzender door middel van een reply e-mail en de ontvangen e-mail uit uw systemen te verwijderen. Als u geen geadresseerde bent, is het niet toegestaan om kennis te nemen van de inhoud, deze te kopieren, te verspreiden, bekend te maken aan derden noch anderszins te gebruiken.
The information contained in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Please notify us immediately if you have received it in error by reply e-mail and then delete this message from your system. __________________________________________________________
Hey,
Everybody thanks for the replies...! I also suffered from CPU eating regexp using '*', but I expected it was my own fault somehow :) But how do I go from here? There's also no trim_right (or _left) function, does anybody have something like that lying around? I guess I just have to process the string, right-to-left, char-by-char...
Depends what you're trying to do, that's not clear for me when reading your mail.
I was about to mention String.trim_all_whites() but it doesn't seem to remove \0. The following does the trick:
----8<----8<----8<----8<---- $ pike Pike v7.6 release 112 running Hilfe v3.5 (Incremental Pike Frontend)
string foo = " This is some text \0\0\0"; String.trim_all_whites(replace(foo,"\0",""));
(1) Result: "This is some text" ---->8---->8---->8---->8----
But if you have a PCRE enabled Pike, i guess something like that is more that you want:
----8<----8<----8<----8<---- $ pike Pike v7.6 release 112 running Hilfe v3.5 (Incremental Pike Frontend)
string foo = " This is some text \0\0\0"; Regexp.PCRE.Studied("[\W]+$")->replace(foo, "");
(1) Result: " This is some text" ---->8---->8---->8---->8----
If you don't have a PCRE enabled Pike, then there should be a solution with Regexp(). But it seems like there's a little bug with $ on Regexp(), but one bug at a time :)
Just recall that Regexp.PCRE() (and thus Regexp.PCRE.Studied()) are Perl Compatible Regular Expression. There are a totally different beast and way faster than Regexp() (which is the same than SimpleRegexp()).
I don't know if something like trim_whites_right() nor trim_whites_left() does exists, but feel free to watch the code of trim_whites()/trim_all_whites() and contribute a Public.String.trim_whites_[right|left]() into http://modules.gotpike.org/ :)
If you don't have a PCRE enabled Pike, then there should be a solution with Regexp(). But it seems like there's a little bug with $ on Regexp(), but one bug at a time :)
Not if it is the stone age Regexp(), as that one did/does not handle strings with null characters in it (except in cases where you really *want* it to treat \0 as end-of-string).
Aha!
Well, it seems like those two are depending on the end of the previous hit a little too much. This gives a more appropriate behaviour:
/home/mirar/.zshenv:21: HISTCHARS can only contain ASCII characters Index: module.pmod.in =================================================================== RCS file: /pike/data/cvsroot/Pike/7.7/src/modules/_Regexp_PCRE/module.pmod.in,v retrieving revision 1.8 diff -u -r1.8 module.pmod.in --- module.pmod.in 19 Feb 2004 13:53:38 -0000 1.8 +++ module.pmod.in 16 Apr 2008 13:11:53 -0000 @@ -133,7 +133,7 @@ if (stringp(with)) res->add(with); else res->add(with(subject[v[0]..v[1]-1]));
- i=v[1]; + if (i!=v[1]) i=v[1]; else res->add(subject[i..i++]); }
res->add(subject[i..]); @@ -203,7 +203,7 @@ if (intp(v) && !handle_exec_error([int]v)) return this; callback(split_subject(subject,v),v); - i=v[1]; + if (i!=v[1]) i=v[1]; else i++; } }
pike-devel@lists.lysator.liu.se