Daniel Bump bump@match.Stanford.EDU writes:
I'm cc'ing both the GNU Go list and the GTP list because this is a protocol issue.
To summarize, Trevor Morris reports that the GNU Go regressions are broke on Visual C. The reason is that apparently printf on VC++ interprets \n as CRLF instead of just LF.
AFAIK VC is doing the correct thing : \n is defined to expand to whatever line ending is appropriate for the platform.
[ I would be inclined to blame the awk for not understanding CRLF when reading in text on a dos-like machine. Possibly a feature of cygwin tools, since someone at work here had similar problems which turned out to be do with the tools not processing CR as I would have expected them to. ]
I think I did mention this feature of \n | \r, and pointed out that any portable implementations of GTP code should use \012 explicitly if they mean a LF character.
I think this is an issue which was discussed in the protocol list but I do not remember whether any conclusions were reached.
Trevor Morris wrote:
Do you mean that the gtp output is getting spurious CR's added by the DOS version of the program?
Yes, this is the problem. Spurious CR added by the DOS version of the program. An appropriate sed script to strip out the offending ^M characters should do it.
Perhaps the GTP needs to explicitly define what the end-of-line character(s) should be?
We can simulate this problem on Linux by replacing all occurrences of \n by \r\n in gtp.c. I just did that and found that regress.sh is broke but twogtp is not.
That wouldn't help since on windows that would (may) give CRCRLF
And on mac, \n is CR
\012 is probably safest if you really mean a LF character
One difference between the two is that regress.sh is awkward where twogtp is perlward. Perhaps we should reimplement regress.sh in perl.
FWIW, the regress script doesn't work properly with /usr/bin/awk on solaris. Need to use nawk instead. perl is probably slightly better defined, and therefore more portable.
dd
David Denholm wrote:
We can simulate this problem on Linux by replacing all occurrences of \n by \r\n in gtp.c. I just did that and found that regress.sh is broke but twogtp is not.
That wouldn't help since on windows that would (may) give CRCRLF
The purpose of that change to gtp.c wasn't to fix anything but to break it. I wanted to see what the problem was. So I hacked in carriage returns to gtp.c, in order to see where they should be stripped in regress.awk.
And on mac, \n is CR
\012 is probably safest if you really mean a LF character
I wonder if it is possible to come to a consensus about newline policy in the gtp before we release 3.0.0.
Dan
I wonder if it is possible to come to a consensus about newline policy in the gtp before we release 3.0.0.
I wonder that too.
My suggestion is: For writing data, use the line ending convention of the platform your software runs on. For reading data, make sure your software can read either convention.
This is not difficult to do on either platform. Other than DOS and UNIX, are their other line ending conventions in common use? What do MAC's do?
Don
From: Daniel Bump bump@match.stanford.edu Cc: gnugo@gnu.org, gtp@lists.lysator.liu.se References: 3.0.3.32.20010818210823.00fb5b48@mail.codebus.com 3.0.3.32.20010818210823.00fb5b48@mail.codebus.com 3.0.3.32.20010819082308.00f49f40@mail.codebus.com 200108191549.IAA31169@match.Stanford.EDU kmelq8vvw0.fsf@insignia.com Sender: gtp-admin@lists.lysator.liu.se Errors-To: gtp-admin@lists.lysator.liu.se X-BeenThere: gtp@lists.lysator.liu.se X-Mailman-Version: 2.0rc1 Precedence: bulk List-Help: mailto:gtp-request@lists.lysator.liu.se?subject=help List-Post: mailto:gtp@lists.lysator.liu.se List-Subscribe: http://lists.lysator.liu.se/mailman/listinfo/gtp, mailto:gtp-request@lists.lysator.liu.se?subject=subscribe List-Id: Discussion about the computer-go protocol GTP. <gtp.lists.lysator.liu.se> List-Unsubscribe: http://lists.lysator.liu.se/mailman/listinfo/gtp, mailto:gtp-request@lists.lysator.liu.se?subject=unsubscribe List-Archive: http://lists.lysator.liu.se/pipermail/gtp/ Date: Sun, 19 Aug 2001 10:01:36 -0700 Content-Type: text Content-Length: 826
David Denholm wrote:
We can simulate this problem on Linux by replacing all occurrences of \n by \r\n in gtp.c. I just did that and found that regress.sh is broke but twogtp is not.
That wouldn't help since on windows that would (may) give CRCRLF
The purpose of that change to gtp.c wasn't to fix anything but to break it. I wanted to see what the problem was. So I hacked in carriage returns to gtp.c, in order to see where they should be stripped in regress.awk.
And on mac, \n is CR
\012 is probably safest if you really mean a LF character
I wonder if it is possible to come to a consensus about newline policy in the gtp before we release 3.0.0.
Dan
_______________________________________________ gtp mailing list gtp@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/gtp
In my implementation of the GTP protocol, I execute the command once a carriage return or line feed is received. Empty lines are ignored. This makes GTP compatible with Unix, DOS/Windows and Macintosh platforms.
However, to be specific, I would recommend "\r" (or \012) for linefeed. I'll put this in the specification.
----- Original Message ----- From: "Daniel Bump" bump@match.Stanford.EDU To: dave.denholm@insignia.com Cc: gnugo@gnu.org; gtp@lists.lysator.liu.se Sent: Sunday, August 19, 2001 10:01 AM Subject: Re: [gtp] Re: VC regression failures
David Denholm wrote:
We can simulate this problem on Linux by replacing all occurrences of \n by \r\n in gtp.c. I just did that and found that regress.sh is broke but twogtp is not.
That wouldn't help since on windows that would (may) give CRCRLF
The purpose of that change to gtp.c wasn't to fix anything but to break it. I wanted to see what the problem was. So I hacked in carriage returns to gtp.c, in order to see where they should be stripped in regress.awk.
And on mac, \n is CR
\012 is probably safest if you really mean a LF character
I wonder if it is possible to come to a consensus about newline policy in the gtp before we release 3.0.0.
Dan
gtp mailing list gtp@lists.lysator.liu.se http://lists.lysator.liu.se/mailman/listinfo/gtp
Phil wrote:
However, to be specific, I would recommend "\r" (or \012) for linefeed. I'll put this in the specification.
Actually I think \f usually denotes \012 (linefeed or formfeed) and \r denotes \013 (carriage return).
If indeed this becomes the standard we (GNU Go) should change the distributed gtp.c and play_gtp.c.
But perhaps the standard actually should allow any one of LF, CR or CRLF as newline and it should be up to the client to recognize these alternatives. In other words a line is terminated by any match of the regular expression [\f\r]+ .
If this policy were adopted then \n\n is not different from \n.
Dan
Dave wrote:
AFAIK VC is doing the correct thing : \n is defined to expand to whatever line ending is appropriate for the platform.
[...]
I think I did mention this feature of \n | \r, and pointed out that any portable implementations of GTP code should use \012 explicitly if they mean a LF character.
[...]
And on mac, \n is CR
\012 is probably safest if you really mean a LF character
I'm almost, but not completely, certain that you are somewhat wrong here. What follows is my understanding, which also could be wrong. It's correct that the C standard doesn't say what numerical value '\n' should have, although ascii LF is a common choice, at least on unix platforms. I'm pretty certain that '\n' does NOT expand to anything like CRLF on dos/windows platforms simply because that is not a single character. It's possible that '\n' has the value ascii CR on MacOS, but that is more than I know about.
The missing step here is that what actually is written to the output is decided by the C library. What in particular is important is whether the FILE being written to has been opened in text mode or in binary mode. On unix systems I don't think this ever makes a difference, but on other platforms it may. On dos/windows platforms, this is where \n (however it is represented) is converted to the CRLF sequence, but only in text mode. Please correct me if I'm wrong about this.
But these are C specific issues and not all that interesting for the definition of the protocol. What I'd like to know, preferrably from people with actual experience of programming on various platforms and in various languages, is whether the newline convention
"A newline is a indicated by a single LF. Possible occurences of CR should be discarded on input."
would be difficult to handle. This is in any case the newline convention I would prefer for GTP.
(The details on how to portably implement this in C is of course highly interesting to GNU Go, but replies about this should probably be limited to the gnugo list.)
/Gunnar
On Monday, August 20, 2001, at 08:47 AM, Gunnar Farnebddck wrote:
Dave wrote:
AFAIK VC is doing the correct thing : \n is defined to expand to whatever line ending is appropriate for the platform.
[...]
I think I did mention this feature of \n | \r, and pointed out that any portable implementations of GTP code should use \012 explicitly if they mean a LF character.
[...]
And on mac, \n is CR
\012 is probably safest if you really mean a LF character
I'm almost, but not completely, certain that you are somewhat wrong here. What follows is my understanding, which also could be wrong. It's correct that the C standard doesn't say what numerical value '\n' should have, although ascii LF is a common choice, at least on unix platforms. I'm pretty certain that '\n' does NOT expand to anything like CRLF on dos/windows platforms simply because that is not a single character. It's possible that '\n' has the value ascii CR on MacOS, but that is more than I know about.
Perl on Windows outputs CRLF for \n. Just read this last night in the Perl Cookbook. So if we want twogtp to work on Windows, the gtp protocol has to deal with CRLF at the minimum. Mac perl probably spits out a CR, so the same goes there, except on Mac OS X, which is really unix...
Someone could write a quick C program to find out what Windows does in C with \n for sure on that platform.
"A newline is a indicated by a single LF. Possible occurences of CR should be discarded on input."
would be difficult to handle. This is in any case the newline convention I would prefer for GTP.
Then it won't work on CR platforms...
There are three cases: CR, LF, CRLF.
If we want to handle all three, we need an or clause to deal with the CR vs LF case. It seems to me our choices are:
1. Handle LF. 2. Handle CRLF by ignoring CRs. 3. Handle all three by looking for CR | LF and ignoring any extra LF.
I think #2 is worse, because it will mysteriously not work on some platforms. YMMV.
#3 doesn't really seem much more difficult then #2 but it does mean we can't use things like fgets.
Pierce
Gunnar wrote:
A newline is a indicated by a single LF. Possible occurences of CR should be discarded on input.
and:
Yes, more or less. Protocol version 1 will be whatever is implemented in GNU Go 3.0.0 when this is released and should only be used by other programs which wish to experiment with the gnugoclient 2.0 or twogtp programs. The first fully specified protocol will be version 2. I expect this to be followed by a version 3 and possibly more when it becomes clear what is missing or done wrong in the protocol. This means I definitely don't expect version 2 to be final and complete but I do expect it to be finished in a not too far future.
I am hoping that GNU Go 3.0.0 will be released by the end of this week and that there will be no substantial changes in the engine. We've achieved clean builds on Unix and GNU/Linux as well as Windows with VC++ or Cygwin. If all is well, 2.7.253 should also build cleanly on Mac OS X and pass the regressions though I have not been informed if that is the case.
So it is worth reviewing how GNU Go 2.7.253 handles the newlines in the GTP.
Newlines are written in gtp.c and play_gtp.c with the standard C function printf(" ... \n"). On Unix this makes a LF=\012. On DOS it makes CRLF. David stated that on Macintosh it makes just a Carriage return but we believe this is not true with OS X. If it were, things would be worse broke than they are now.
(Does anyone know?)
We are aiming at good compatibility with Mac OS X and may be close to achieving it, thanks to help from Pierce Wetter and others. I'm less optimistic that we will have full compatibility with Mac classic though I'll change the names of two files as Alan Crossman suggested.
I have to interpret Gunnar's statement as meaning that writing an LF is mandated and CRLF is acceptable.
Dan