As for UTF-8, here's some results:
object o=_Regexp_PCRE._pcre("b.d",_Regexp_PCRE.OPTION.UTF8); o->exec(string_to_utf8("\34429b\1234d\123132"));
(13) Result: ({ /* 2 elements */ 3, 7 })
map(_Regexp_PCRE.split_subject(string_to_utf8("\34429b\1234d\123132"),o->exec(string_to_utf8("\34429b\1234d\123132"))),utf8_to_string);
(16) Result: ({ /* 1 element */ "b\1234d" })
so it seems it gives indexes to the matching byte offsets, and not character offsets. Is there any convenience function for figuring out real character offsets from byte offsets in an utf8-encoded string?
/ Mirar
Previous text:
2003-09-20 16:39: Subject: Re: bug: casts to string for long double and long long int are incorrect (7.4.28 rel)
I was pondering about that. I'm going to investigate...
I was considering having several PCRE Regexp classes, one fast and one that does study, and maybe another set to do automatic widestring <-> UTF-8 conversions.
/ Mirar