I've discovered a little annoyance with the line iterator. If I have a file with the contents "a\nb\n" and runs
cat test | pike -e "foreach(Stdio.stdin->line_iterator();; string x) write(x);"
I get the output
ab
but if I run the same code with the file content "a\nb" I get
a
as output. Is it a bug or just a misfeature?
Aren't lines that don't end with a '\n' considered proper lines? Why doesn't an EOF qualify as a line delimiter?
In UNIX text files all lines should be terminated with '\n'.
Examples:
$ echo foo | dd bs=1 count=3 >foobar.txt 3+0 records in 3+0 records out $ cat foobar.txt foo$ ex foobar.txt "foobar.txt" [Incomplete last line] 1 line, 3 characters :q $ sed -e '1p' -ed <foobar.txt $ echo foo | sed -e '1p' -ed foo
/ Henrik Grubbström (Lysator)
Previous text:
Don't get me wrong; I did not mean I liked the line_iterator behaviour of today. What I do mean is that I would probably prefer a line iterator that made no difference between "a\nb\n" and "a\nb". And that I would be interested to hear arguments about how that would be bad, for that matter.
/ Johan Sundström (a hugging punishment!)
Previous text:
All reasonable programs processing text files let the newline on the final line be completely optional. Having the line iterator silently ignore a final line with no terminating newline seems clearly wrong: It makes the line iterator useless for reasonable programs.
Returning a final empty line for files that *do* have a terminating newline, on the other hand, seems more harmless but still somewhat stupid.
Look at the line-based cat program:
foreach(Stdio.stdin->line_iterator();; string x) write("%s\n", x);
If I can choose between a line_iterator behaviour makes that program convert
"a\nb\n" --> "a\nb\n" "a\nb" --> "a\nb\n"
and a different behaviour that results in
"a\nb\n" --> "a\nb\n\n" (*) "a\nb" --> "a\nb\n"
I'd prefer the former (* is what you get if the iterator emits a final empty line). Even if it's different from division by "\n".
/ Niels Möller ()
Previous text:
The natural way, if that's what you want to do, would be an iterator that emits lines *including* the final newline (or without it, for lines that have no terminator).
Your problem is that the line iterator drops the newlines, so that you don't see directly if any newlines were present to begin with. And to get that information back, you want it to return an empty lines at the end for the common case of files that are properly terminated.
I find that interface really strange and ugly. If a file contains ten lines, I'd expect a line_iterator to emit exactly ten lines, no more, no less.
/ Niels Möller ()
Previous text:
What's the iterator-correspondence to * then? With a /-behaviour of the line iterator
foreach(Stdio.stdin->line_iterator();; string x) write("%s\n", x);
adds an extra newline to the input. It's *not* behaving like / followed by *. That makes the correspondence between line_iterator and / pretty useless, in my opinion.
Anyway, there's no need to kill each other for this. I think the gets way is the most commonly useful variant, but we could have more than one iterator if you really think some different behaviour is also useful.
/ Niels Möller ()
Previous text:
If you want the samt file back you do
array out; foreach(Stdio.stdin->line_iterator();; string line) out += process_line(line); Stdio.write_file("processed.cfg", out*"\n");
Yes, we could have lots of iterators. The one I propose should be the main line iterator though since it is the correct one.
/ Peter Bortas
Previous text:
That's not good enough. If I wanted to suck up a complete file into memory, I wouldn't bother with line_iterator in the first place.
What I need to do is something like
int first = 1; foreach(Stdio.stdin->line_iterator();; string line) { if (first) first = 0; else write("\n");
write(line); }
which I find way too complex for such a simple task.
/ Niels Möller ()
Previous text:
The easiest way, if you really want to suck up the entire file into memory, is
write(Array.map(Stdio.stdin->read()/"\n", process_line) * "\n");
That's nice and simple, but wastes memory. To me, the entire point of the line_iterator is to make it *equally* easy to process the file without incurring that memory waste. That means that
foreach(Stdio.stdin.line_iterator();; string x) write("%s\n", process_line(x));
must do the right thing.
The iterator you're arguing for simply doesn't do that.
To be a little constructive, I'll suggest that line_iterator behaves like gets, and handles all flavours of line termination sequences, etc, and a new one, say newline_iterator, that does exactly what / "\n" would do.
/ Niels Möller ()
Previous text:
On Tue, Dec 31, 2002 at 03:50:04PM +0100, Niels Möller () @ Pike (-) developers forum wrote:
but which one will?
if there is no \n at the very end, you get two options: either the last line will be dropped, which is bad, or your code above will add an extra \n at the end, which is not the right thing either.
greetings, martin.
I'd prefer the behaviour that would add a newline to files that don't have any terminating newline on the final line. That's a marginal case, and all reasonable programs should interpret the files in the same way. The iterator is also free to discard information about whether \n or \r\n was used in the input.
And if that's not acceptable, I'd prefer an iterator that emits lines *including* any newline characters. I.e.
"a\nb" --> "a\n", "b" "a\nb\n" --> "a\n", "b\n" "a\r\nb\n" --> "a\r\n", "b\n"
That seems to be the only reasonable behaviour, assuming that you want to keep detailed information about line endings around.
I find both of these alternatives a lot better, clearer and less obscure than the behaviour Peter prefers.
/ Niels Möller ()
Previous text:
I don't find it correct or useful at all. I find a line iterator that does the same as gets correct.
If you know you need to recreate a file, why not simply use /"\n" and operate directly on that array? Then you know exactly what's going on. You could even insert new lines and remove old lines then.
In the more common case of reading or rewriting a small human readable database (config file, or lists or some sorts), adding the missing "\n" at the end shouldn't be dangerous at all, rather the opposite.
/ Mirar
Previous text:
The problem is really that I once designed gets() to *not* return the newlines as a part of the returned string. If the newlines were returned, then it would be easy see what is going on at all times:
"a\nb" would return "a\n" and then "b"
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
With iterators, I guess one could have a basic line iterator that keeps the lineendings, and filter it like
foreach(f->line_iterator();; string x) ... /* Process full lines */
foreach(some_filter(f->line_iterator());; string x) ... /* Process filtered lines */
where some_filter could for example
* Strip line endings * Strip strip line endings as well as other trailing white space * Strip leading white space * Strip empty lines * Strip #-style comments * Process -style lines continuations
etc. I'm not very familiar with the iterator stuff, but I hope it's reasonably easy to write such filters, by stacking one iterator on top of another.
And if anybody thinks that helps readability, perhaps one could arrange the line_iterator to take the filter as an argument (but on the other hand, it might make more sense to give line_iterator an optional argument that says what kind of end-of-line characters it should recognize).
/ Niels Möller ()
Previous text:
So then "a\nb" would return a and then b as expected since that is what "/" does. Also "a\n" should return "a" and "".
I find it very confusing that the line iterator would skip the last line. In the very least make it an option. In the very least I will never use it if this is the (only) mode of operation.
/ David Hedbor
Previous text:
I don't interpret it that way. line_iterator is something that iterates over lines, while /"\n" splits on the character '\n' which only in certain special cases is the same thing. The iterator correspondence to /"\n" is String.SplitIterator.
I second the opinion that line_iterator should behave like gets().
/ Martin Stjernholm, Roxen IS
Previous text:
pike-devel@lists.lysator.liu.se