I haven't been keeping up with the *Buffer stuff. Would it be possible to make a line iterator on top of this, and would it do any differance to performance?
I have sticky note on my desk that says I should check why Python is faster on some basic log parsing of huge ascii logs I do.
Well, feel free to use the tokenization capabilities of Stdio.IOBuffer to see if that works better. Who knows?
class MyBuffer(Stdio.File huge_text_file ) { inherit Stdio.IOBuffer; int range_error( int bytes ) { string s = huge_text_file->read(8192); if( s && strlen(s)) { add(s); return; } } }
MyBuffer x = MyBuffer(whatever_fd); while( string line = buf->sscanf( "%[^\n]\n" ) ) .. process da line
The current buffered I/O mode for Stdio.File does not work unless the file is in non-blocking mode due to how the reading in the buffer works.
It can be fixed in at least two ways (one which is fairly obvious from the code above) but it is not yet done.