Would replace(string, array(string), array(string), mapping(string:int)) be a useful/convenient interface for the following functionality?:
I'd like the 4th argument mapping to be filled by the replace function to tally the number of times it has found and replaced the corresponding string.
It is difficult at best to obtain that information (efficiently) otherwise, especially when some of the strings fit in the prefix of some of the others.
The actual use case for me is the bindings parser for pgsql, which preferably needs to eliminate any unused variables before constructing the query to be sent to the database. The reason I'm looking at this is because I just had an occurrence of a binding called ":telno" and ":telnopriv", whereas the :telno wasn't present, but the :telnopriv was. Quick checks with has_value() break down in this case, because :telno is found as a prefix of the actually occurring :telnopriv, obviously.
Any alternate ideas which solve this problem are welcome, of course.
Sounds to me like you should do proper tokenization instead, perhaps with a helper function written in C if it gets too slow otherwise.
Martin Stjernholm, Roxen IS @ Pike developers forum wrote:
Sounds to me like you should do proper tokenization instead, perhaps with a helper function written in C if it gets too slow otherwise.
The bindings specs as specified for the current Sql interface don't require a tokeniser, that'd be overkill. However, when checking the "magic" behind replace(), I find that it is doing a lot of convoluted things, and doesn't seem to be using Boyer-Moore or anything related.
It might make sense to revamp that logic to use a faster search algorithm anyway, possibly even augmented with a regexp engine of mine that outperforms PCRE. I'll revisit this some other time.
By "tokenizer" I mean anything that actually finds the real ends of those binding identifiers, so that you don't stumble on false prefix matches. It doesn't have to do more than that.
pike-devel@lists.lysator.liu.se