I've discovered a little annoyance with the line iterator. If I have a file with the contents "a\nb\n" and runs
cat test | pike -e "foreach(Stdio.stdin->line_iterator();; string x) write(x);"
I get the output
ab
but if I run the same code with the file content "a\nb" I get
a
as output. Is it a bug or just a misfeature?
I'd vote for bug. Quite a common one where regexps are used too.
/ Peter Lundqvist (disjunkt)
Previous text:
2002-12-30 01:14: Subject: stdin->line_iterator
I've discovered a little annoyance with the line iterator. If I have a file with the contents "a\nb\n" and runs
cat test | pike -e "foreach(Stdio.stdin->line_iterator();; string x) write(x);"
I get the output
ab
but if I run the same code with the file content "a\nb" I get
a
as output. Is it a bug or just a misfeature?
/ Martin Nilsson (bygger parser
It's intentional behaviour...
/ Henrik Grubbström (Lysator)
Previous text:
2002-12-30 01:14: Subject: stdin->line_iterator
I've discovered a little annoyance with the line iterator. If I have a file with the contents "a\nb\n" and runs
cat test | pike -e "foreach(Stdio.stdin->line_iterator();; string x) write(x);"
I get the output
ab
but if I run the same code with the file content "a\nb" I get
a
as output. Is it a bug or just a misfeature?
/ Martin Nilsson (bygger parser
Shouldn't the iterator and a gets() loop give the same result?
| % <test pike -e 'foreach(Stdio.stdin->line_iterator();; string x) write(x+"\n");' | a | % <test pike -e 'string s;while(s=Stdio.stdin->gets()) write(s+"\n");' | a | b
/ Mirar
Previous text:
2002-12-30 10:10: Subject: stdin->line_iterator
It's intentional behaviour...
/ Henrik Grubbström (Lysator)
I'm curios as to why. Aren't lines that don't end with a '\n' considered proper lines? Why doesn't an EOF qualify as a line delimiter?
/ Peter Lundqvist (disjunkt)
Previous text:
2002-12-30 10:10: Subject: stdin->line_iterator
It's intentional behaviour...
/ Henrik Grubbström (Lysator)
Aren't lines that don't end with a '\n' considered proper lines? Why doesn't an EOF qualify as a line delimiter?
In UNIX text files all lines should be terminated with '\n'.
Examples:
$ echo foo | dd bs=1 count=3 >foobar.txt 3+0 records in 3+0 records out $ cat foobar.txt foo$ ex foobar.txt "foobar.txt" [Incomplete last line] 1 line, 3 characters :q $ sed -e '1p' -ed <foobar.txt $ echo foo | sed -e '1p' -ed foo
/ Henrik Grubbström (Lysator)
Previous text:
2002-12-30 10:36: Subject: stdin->line_iterator
I'm curios as to why. Aren't lines that don't end with a '\n' considered proper lines? Why doesn't an EOF qualify as a line delimiter?
/ Peter Lundqvist (disjunkt)
When is that behaviour useful?
/ Mirar
Previous text:
2002-12-30 10:44: Subject: stdin->line_iterator
Aren't lines that don't end with a '\n' considered proper lines? Why doesn't an EOF qualify as a line delimiter?
In UNIX text files all lines should be terminated with '\n'.
Examples:
$ echo foo | dd bs=1 count=3 >foobar.txt 3+0 records in 3+0 records out $ cat foobar.txt foo$ ex foobar.txt "foobar.txt" [Incomplete last line] 1 line, 3 characters :q $ sed -e '1p' -ed <foobar.txt $ echo foo | sed -e '1p' -ed foo
/ Henrik Grubbström (Lysator)
So why does line_iterator do that? It could at least avoid doing that and do the exact same thing as gets and ngets unless you give it some sort of flag.
/ Mirar
Previous text:
2002-12-30 13:13: Subject: stdin->line_iterator
Never. Unless you count hours of debugging due to a missing \n in a config file a feature.
/ Peter Bortas
I have no good answer to that. To me the natural behaviour would be the same as for foreach(foo/"\n", string line).
/ Peter Bortas
Previous text:
2002-12-30 14:07: Subject: stdin->line_iterator
So why does line_iterator do that? It could at least avoid doing that and do the exact same thing as gets and ngets unless you give it some sort of flag.
/ Mirar
To get that final iteration on "" is, on the other hand, seldom useful and often needing an extra removal of the final element. (Assuming there was actually a trailing newline too, of course.)
/ Johan Sundström (a hugging punishment!)
Previous text:
2002-12-30 14:17: Subject: stdin->line_iterator
I have no good answer to that. To me the natural behaviour would be the same as for foreach(foo/"\n", string line).
/ Peter Bortas
The difference being that _you_ discard information.
/ Martin Nilsson (bygger parser
Previous text:
2002-12-30 15:16: Subject: stdin->line_iterator
To get that final iteration on "" is, on the other hand, seldom useful and often needing an extra removal of the final element. (Assuming there was actually a trailing newline too, of course.)
/ Johan Sundström (a hugging punishment!)
Don't get me wrong; I did not mean I liked the line_iterator behaviour of today. What I do mean is that I would probably prefer a line iterator that made no difference between "a\nb\n" and "a\nb". And that I would be interested to hear arguments about how that would be bad, for that matter.
/ Johan Sundström (a hugging punishment!)
Previous text:
2002-12-30 15:19: Subject: stdin->line_iterator
The difference being that _you_ discard information.
/ Martin Nilsson (bygger parser
I wouldn't like that. That might be handy sometimes but is not intuitive. It's a DWIM.
/ Peter Bortas
Previous text:
2002-12-30 17:38: Subject: stdin->line_iterator
Don't get me wrong; I did not mean I liked the line_iterator behaviour of today. What I do mean is that I would probably prefer a line iterator that made no difference between "a\nb\n" and "a\nb". And that I would be interested to hear arguments about how that would be bad, for that matter.
/ Johan Sundström (a hugging punishment!)
All reasonable programs processing text files let the newline on the final line be completely optional. Having the line iterator silently ignore a final line with no terminating newline seems clearly wrong: It makes the line iterator useless for reasonable programs.
Returning a final empty line for files that *do* have a terminating newline, on the other hand, seems more harmless but still somewhat stupid.
Look at the line-based cat program:
foreach(Stdio.stdin->line_iterator();; string x) write("%s\n", x);
If I can choose between a line_iterator behaviour makes that program convert
"a\nb\n" --> "a\nb\n" "a\nb" --> "a\nb\n"
and a different behaviour that results in
"a\nb\n" --> "a\nb\n\n" (*) "a\nb" --> "a\nb\n"
I'd prefer the former (* is what you get if the iterator emits a final empty line). Even if it's different from division by "\n".
/ Niels Möller ()
Previous text:
2002-12-30 21:21: Subject: stdin->line_iterator
I wouldn't like that. That might be handy sometimes but is not intuitive. It's a DWIM.
/ Peter Bortas
It's still a DWIM that makes it impossible to iterate over a file and rebuild it (possibly with modifications) in the state it was to begin with.
/ Peter Bortas
Previous text:
2002-12-30 21:33: Subject: stdin->line_iterator
All reasonable programs processing text files let the newline on the final line be completely optional. Having the line iterator silently ignore a final line with no terminating newline seems clearly wrong: It makes the line iterator useless for reasonable programs.
Returning a final empty line for files that *do* have a terminating newline, on the other hand, seems more harmless but still somewhat stupid.
Look at the line-based cat program:
foreach(Stdio.stdin->line_iterator();; string x) write("%s\n", x);
If I can choose between a line_iterator behaviour makes that program convert
"a\nb\n" --> "a\nb\n" "a\nb" --> "a\nb\n"
and a different behaviour that results in
"a\nb\n" --> "a\nb\n\n" (*) "a\nb" --> "a\nb\n"
I'd prefer the former (* is what you get if the iterator emits a final empty line). Even if it's different from division by "\n".
/ Niels Möller ()
The natural way, if that's what you want to do, would be an iterator that emits lines *including* the final newline (or without it, for lines that have no terminator).
Your problem is that the line iterator drops the newlines, so that you don't see directly if any newlines were present to begin with. And to get that information back, you want it to return an empty lines at the end for the common case of files that are properly terminated.
I find that interface really strange and ugly. If a file contains ten lines, I'd expect a line_iterator to emit exactly ten lines, no more, no less.
/ Niels Möller ()
Previous text:
2002-12-31 00:04: Subject: stdin->line_iterator
It's still a DWIM that makes it impossible to iterate over a file and rebuild it (possibly with modifications) in the state it was to begin with.
/ Peter Bortas
No, there is nothing especially natural about including the ending new line. / and * are symmetric on strings the iterator should be a / equivalent. I find that symmetry natural and beautiful.
/ Peter Bortas
Previous text:
2002-12-31 00:26: Subject: stdin->line_iterator
The natural way, if that's what you want to do, would be an iterator that emits lines *including* the final newline (or without it, for lines that have no terminator).
Your problem is that the line iterator drops the newlines, so that you don't see directly if any newlines were present to begin with. And to get that information back, you want it to return an empty lines at the end for the common case of files that are properly terminated.
I find that interface really strange and ugly. If a file contains ten lines, I'd expect a line_iterator to emit exactly ten lines, no more, no less.
/ Niels Möller ()
I agree (I believe). :-)
/ David Hedbor
Previous text:
2002-12-31 00:45: Subject: stdin->line_iterator
No, there is nothing especially natural about including the ending new line. / and * are symmetric on strings the iterator should be a / equivalent. I find that symmetry natural and beautiful.
/ Peter Bortas
What's the iterator-correspondence to * then? With a /-behaviour of the line iterator
foreach(Stdio.stdin->line_iterator();; string x) write("%s\n", x);
adds an extra newline to the input. It's *not* behaving like / followed by *. That makes the correspondence between line_iterator and / pretty useless, in my opinion.
Anyway, there's no need to kill each other for this. I think the gets way is the most commonly useful variant, but we could have more than one iterator if you really think some different behaviour is also useful.
/ Niels Möller ()
Previous text:
2002-12-31 00:45: Subject: stdin->line_iterator
No, there is nothing especially natural about including the ending new line. / and * are symmetric on strings the iterator should be a / equivalent. I find that symmetry natural and beautiful.
/ Peter Bortas
Yes, we could have lots of iterators. The one I propose should be the main line iterator though.
/ Peter Bortas
Previous text:
2002-12-31 13:51: Subject: stdin->line_iterator
What's the iterator-correspondence to * then? With a /-behaviour of the line iterator
foreach(Stdio.stdin->line_iterator();; string x) write("%s\n", x);
adds an extra newline to the input. It's *not* behaving like / followed by *. That makes the correspondence between line_iterator and / pretty useless, in my opinion.
Anyway, there's no need to kill each other for this. I think the gets way is the most commonly useful variant, but we could have more than one iterator if you really think some different behaviour is also useful.
/ Niels Möller ()
If you want the samt file back you do
array out; foreach(Stdio.stdin->line_iterator();; string line) out += process_line(line); Stdio.write_file("processed.cfg", out*"\n");
Yes, we could have lots of iterators. The one I propose should be the main line iterator though since it is the correct one.
/ Peter Bortas
Previous text:
2002-12-31 13:51: Subject: stdin->line_iterator
What's the iterator-correspondence to * then? With a /-behaviour of the line iterator
foreach(Stdio.stdin->line_iterator();; string x) write("%s\n", x);
adds an extra newline to the input. It's *not* behaving like / followed by *. That makes the correspondence between line_iterator and / pretty useless, in my opinion.
Anyway, there's no need to kill each other for this. I think the gets way is the most commonly useful variant, but we could have more than one iterator if you really think some different behaviour is also useful.
/ Niels Möller ()
That's not good enough. If I wanted to suck up a complete file into memory, I wouldn't bother with line_iterator in the first place.
What I need to do is something like
int first = 1; foreach(Stdio.stdin->line_iterator();; string line) { if (first) first = 0; else write("\n");
write(line); }
which I find way too complex for such a simple task.
/ Niels Möller ()
Previous text:
2002-12-31 14:26: Subject: stdin->line_iterator
If you want the samt file back you do
array out; foreach(Stdio.stdin->line_iterator();; string line) out += process_line(line); Stdio.write_file("processed.cfg", out*"\n");
Yes, we could have lots of iterators. The one I propose should be the main line iterator though since it is the correct one.
/ Peter Bortas
I would, because it looks more readable than
foreach(Stdio.stdin->read()/"\n", string line)
But that is beside the point that it is the Right Thing to do. Dropping information is not.
/ Peter Bortas
Previous text:
2002-12-31 14:42: Subject: stdin->line_iterator
That's not good enough. If I wanted to suck up a complete file into memory, I wouldn't bother with line_iterator in the first place.
What I need to do is something like
int first = 1; foreach(Stdio.stdin->line_iterator();; string line) { if (first) first = 0; else write("\n");
write(line);
}
which I find way too complex for such a simple task.
/ Niels Möller ()
I would actually expect it to drop information about how a newline is encoded on the platform it runs on too.
/ Johan Sundström (a hugging punishment!)
Previous text:
2002-12-31 15:03: Subject: stdin->line_iterator
I would, because it looks more readable than
foreach(Stdio.stdin->read()/"\n", string line)
But that is beside the point that it is the Right Thing to do. Dropping information is not.
/ Peter Bortas
I would expect it to support all commonly used line endings (\n, \r and \n\r), regardless of which OS it is running on.
/ Mattias Wingstedt (Firefruit)
Previous text:
2002-12-31 15:12: Subject: stdin->line_iterator
I would actually expect it to drop information about how a newline is encoded on the platform it runs on too.
/ Johan Sundström (a hugging punishment!)
The easiest way, if you really want to suck up the entire file into memory, is
write(Array.map(Stdio.stdin->read()/"\n", process_line) * "\n");
That's nice and simple, but wastes memory. To me, the entire point of the line_iterator is to make it *equally* easy to process the file without incurring that memory waste. That means that
foreach(Stdio.stdin.line_iterator();; string x) write("%s\n", process_line(x));
must do the right thing.
The iterator you're arguing for simply doesn't do that.
To be a little constructive, I'll suggest that line_iterator behaves like gets, and handles all flavours of line termination sequences, etc, and a new one, say newline_iterator, that does exactly what / "\n" would do.
/ Niels Möller ()
Previous text:
2002-12-31 15:03: Subject: stdin->line_iterator
I would, because it looks more readable than
foreach(Stdio.stdin->read()/"\n", string line)
But that is beside the point that it is the Right Thing to do. Dropping information is not.
/ Peter Bortas
On Tue, Dec 31, 2002 at 03:50:04PM +0100, Niels Möller () @ Pike (-) developers forum wrote:
foreach(Stdio.stdin.line_iterator();; string x) write("%s\n", process_line(x)); must do the right thing.
The iterator you're arguing for simply doesn't do that.
but which one will?
if there is no \n at the very end, you get two options: either the last line will be dropped, which is bad, or your code above will add an extra \n at the end, which is not the right thing either.
greetings, martin.
I'd prefer the behaviour that would add a newline to files that don't have any terminating newline on the final line. That's a marginal case, and all reasonable programs should interpret the files in the same way. The iterator is also free to discard information about whether \n or \r\n was used in the input.
And if that's not acceptable, I'd prefer an iterator that emits lines *including* any newline characters. I.e.
"a\nb" --> "a\n", "b" "a\nb\n" --> "a\n", "b\n" "a\r\nb\n" --> "a\r\n", "b\n"
That seems to be the only reasonable behaviour, assuming that you want to keep detailed information about line endings around.
I find both of these alternatives a lot better, clearer and less obscure than the behaviour Peter prefers.
/ Niels Möller ()
Previous text:
2002-12-31 16:00: Subject: Re: stdin->line_iterator
On Tue, Dec 31, 2002 at 03:50:04PM +0100, Niels Möller () @ Pike (-) developers forum wrote:
foreach(Stdio.stdin.line_iterator();; string x) write("%s\n", process_line(x)); must do the right thing.
The iterator you're arguing for simply doesn't do that.
but which one will?
if there is no \n at the very end, you get two options: either the last line will be dropped, which is bad, or your code above will add an extra \n at the end, which is not the right thing either.
greetings, martin.
interested in doing pike programming, sTeam/caudium/pike/roxen training, sTeam/caudium/roxen and/or unix system administration anywhere in the world. -- pike programmer working in europe csl-gmbh.net open-steam.org (www.archlab|(www|db).hb2).tuwien.ac.at unix bahai.or.at iaeste.(tuwien.ac|or).at systemadministrator (stuts|black.linux-m68k).org is.(schon.org|root.at) Martin Bähr http://www.iaeste.or.at/~mbaehr/
/ Brevbäraren
I don't find it correct or useful at all. I find a line iterator that does the same as gets correct.
If you know you need to recreate a file, why not simply use /"\n" and operate directly on that array? Then you know exactly what's going on. You could even insert new lines and remove old lines then.
In the more common case of reading or rewriting a small human readable database (config file, or lists or some sorts), adding the missing "\n" at the end shouldn't be dangerous at all, rather the opposite.
/ Mirar
Previous text:
2002-12-31 14:26: Subject: stdin->line_iterator
If you want the samt file back you do
array out; foreach(Stdio.stdin->line_iterator();; string line) out += process_line(line); Stdio.write_file("processed.cfg", out*"\n");
Yes, we could have lots of iterators. The one I propose should be the main line iterator though since it is the correct one.
/ Peter Bortas
The problem is really that I once designed gets() to *not* return the newlines as a part of the returned string. If the newlines were returned, then it would be easy see what is going on at all times:
"a\nb" would return "a\n" and then "b"
/ Fredrik (Naranek) Hubinette (Real Build Master)
Previous text:
2002-12-31 00:04: Subject: stdin->line_iterator
It's still a DWIM that makes it impossible to iterate over a file and rebuild it (possibly with modifications) in the state it was to begin with.
/ Peter Bortas
With iterators, I guess one could have a basic line iterator that keeps the lineendings, and filter it like
foreach(f->line_iterator();; string x) ... /* Process full lines */
foreach(some_filter(f->line_iterator());; string x) ... /* Process filtered lines */
where some_filter could for example
* Strip line endings * Strip strip line endings as well as other trailing white space * Strip leading white space * Strip empty lines * Strip #-style comments * Process -style lines continuations
etc. I'm not very familiar with the iterator stuff, but I hope it's reasonably easy to write such filters, by stacking one iterator on top of another.
And if anybody thinks that helps readability, perhaps one could arrange the line_iterator to take the filter as an argument (but on the other hand, it might make more sense to give line_iterator an optional argument that says what kind of end-of-line characters it should recognize).
/ Niels Möller ()
Previous text:
2002-12-31 23:25: Subject: stdin->line_iterator
The problem is really that I once designed gets() to *not* return the newlines as a part of the returned string. If the newlines were returned, then it would be easy see what is going on at all times:
"a\nb" would return "a\n" and then "b"
/ Fredrik (Naranek) Hubinette (Real Build Master)
I see visions of automap, but I like the idea. A number of easily accessable filters would help speed up a lot of common line iteration operations.
/ Peter Bortas
Previous text:
2003-01-01 18:13: Subject: stdin->line_iterator
With iterators, I guess one could have a basic line iterator that keeps the lineendings, and filter it like
foreach(f->line_iterator();; string x) ... /* Process full lines */
foreach(some_filter(f->line_iterator());; string x) ... /* Process filtered lines */
where some_filter could for example
- Strip line endings
- Strip strip line endings as well as other trailing white space
- Strip leading white space
- Strip empty lines
- Strip #-style comments
- Process -style lines continuations
etc. I'm not very familiar with the iterator stuff, but I hope it's reasonably easy to write such filters, by stacking one iterator on top of another.
And if anybody thinks that helps readability, perhaps one could arrange the line_iterator to take the filter as an argument (but on the other hand, it might make more sense to give line_iterator an optional argument that says what kind of end-of-line characters it should recognize).
/ Niels Möller ()
Does the optimizer do such optimizations (using that assumption), by the way? If so, that would make it a very valid point.
/ Johan Sundström (a hugging punishment!)
Previous text:
2002-12-31 00:00: Subject: stdin->line_iterator
gets() is not an iterator. The iterator is the equivalent of /"\n" and should behave as such.
/ Peter Bortas
So then "a\nb" would return a and then b as expected since that is what "/" does. Also "a\n" should return "a" and "".
I find it very confusing that the line iterator would skip the last line. In the very least make it an option. In the very least I will never use it if this is the (only) mode of operation.
/ David Hedbor
Previous text:
2002-12-31 00:00: Subject: stdin->line_iterator
gets() is not an iterator. The iterator is the equivalent of /"\n" and should behave as such.
/ Peter Bortas
I don't interpret it that way. line_iterator is something that iterates over lines, while /"\n" splits on the character '\n' which only in certain special cases is the same thing. The iterator correspondence to /"\n" is String.SplitIterator.
I second the opinion that line_iterator should behave like gets().
/ Martin Stjernholm, Roxen IS
Previous text:
2002-12-31 00:00: Subject: stdin->line_iterator
gets() is not an iterator. The iterator is the equivalent of /"\n" and should behave as such.
/ Peter Bortas
I wouldn't mind a remove_empty_lines-iterator, but until then I prefer the usual manual discarding.
/ Peter Bortas
Previous text:
2002-12-30 15:16: Subject: stdin->line_iterator
To get that final iteration on "" is, on the other hand, seldom useful and often needing an extra removal of the final element. (Assuming there was actually a trailing newline too, of course.)
/ Johan Sundström (a hugging punishment!)
gets() doesn't give the final "", but it gives the final line even if it lacks a "\n".
/ Mirar
Previous text:
2002-12-30 15:16: Subject: stdin->line_iterator
To get that final iteration on "" is, on the other hand, seldom useful and often needing an extra removal of the final element. (Assuming there was actually a trailing newline too, of course.)
/ Johan Sundström (a hugging punishment!)
pike-devel@lists.lysator.liu.se