Hmm, it still doesn't work very well if the indata is huge... but I guess it wont be that much worse then the utf8 string.
The most important right now is a quick function for start_index from character to byte index, though.
/ Mirar
Previous text:
2003-09-24 13:14: Subject: utf8_char_index
The best would probably be a method like 'array(string,string) string_to_utf8_with_index( string input );'
that returns the utf8 string and a string (or array with integers, but that would use even more memory) with the byte->character mapping.
[string index,string utf8] = string_utf8_with_index( data ); array(int) offsets = rows(index,regexp_function_utf8( utf8 ));
/ Per Hedbor ()