Regexp re = Regexp.PCRE("([a-z]+)"); re->split("123abc123");
error returned from exec: ERROR.BADOPTION
Ups. Good observation.
Also, side note - the ovector size is defined as 3000 - isn't this an overkill for most cases? It will consume 12K of stack space with every call to exec() (at least, this space is not dynamically allocated, which would be even worse).
12K isn't much, since the stack is at least 2Mb on any machine I know. And it's quickly eaten up if you use recursive regexps, which you can do with PCRE.
Anything is generated from the .cmod file, so patch it and see if it gets any better.
/ Mirar
Previous text:
2004-06-27 02:56: Subject: Regexp.PCRE problem
Hi,
There is one problem with Regexp.PCRE when study() is called (or Regexp.PCRE.Studied()). exec() code (in cmod) contains the following:
---snip--- #ifdef PCRE_EXTRA_STUDY_DATA if (THIS->extra) opts|=PCRE_EXTRA_STUDY_DATA; #else /* FIXME: Throw an error if THIS->extra is set? */ #endif /* PCRE_EXTRA_STUDY_DATA */ ---snip---
and opts will be used later in call to pcre_exec(), but.. this is a bit incorrect, since this option should be set not in the call to pcre_exec(), but in pcre_extra struct (field "flags").
Hence, any call to exec()/split() etc. for studied PCRE gives an error (with PCRE 4.4):
Regexp re = Regexp.PCRE("([a-z]+)"); re->split("123abc123");
error returned from exec: ERROR.BADOPTION
In case of PCRE 3.9, it passes by, but it is only because there is no symbol PCRE_EXTRA_STUDY_DATA defined.
Additionally, PCRE doc says (man pcreapi):
---snip--- Other flag bits should be set to zero. The study_data field is set in the pcre_extra block that is returned by pcre_study(), together with the appropriate flag bit. You should not set this yourself, but you can add to the block by setting the other fields. ---snip---
So, there is absolutely no need to pass any options to pcre_exec() call while handling exec().
I would commit a fix, but I am not sure how to regenerate file pcre_glue.cmod.compiled - or is it enough to make change in cmod and .compiled will be (re)generated automatically?
Also, side note - the ovector size is defined as 3000 - isn't this an overkill for most cases? It will consume 12K of stack space with every call to exec() (at least, this space is not dynamically allocated, which would be even worse).
Regards, /Al
/ Brevbäraren