wish: string with other quoting then \

List overview All Threads
Download

newer

older

nasm and alpha platform ?

xenofarm target problem with 7.5...

Mirar ＠ Pike developers forum

21 Sep 2003 21 Sep '03

11:20 a.m.

If it didn't need " quoted either, it would be quite useful for Pike programs writing Pike programs or HTML or the like...

Show replies by date

Alexander Demenshin

21 Sep 21 Sep

2:27 p.m.

On Sun, Sep 21, 2003 at 01:20:01PM +0200, Mirar @ Pike developers forum wrote:

...

If it didn't need " quoted either, it would be quite useful for Pike programs writing Pike programs or HTML or the like...

There is unused (mostly) backquote character (`), which might serve this purpose (perhaps in combination with #):

string x = ` Some text with " characters " and multiple lines in it`

It might clutter (somehow) with operator definitions, but unlikely.

Or alternative (a-la Perl):

string x = #``__EOS this is a multiline "unquoted" string __EOS

This last one assuming that string quoted by backquotes may not be empty, of course.

The first syntax is good enough for REs as well, IMHO.

Regards, /Al

Mirar ＠ Pike developers forum

2:45 p.m.

I think that would have problems, though, since the `-character is used in identifiers in pike...

...

`+(1,2);

Result: 3

...

string s=`+(1,2);`

...but you're positive to my ideas in general?

/ Mirar

Previous text:

...

2003-09-21 16:28: Subject: Re: wish: string with other quoting then \

On Sun, Sep 21, 2003 at 01:20:01PM +0200, Mirar @ Pike developers forum wrote:

...
If it didn't need " quoted either, it would be quite useful for Pike programs writing Pike programs or HTML or the like...

There is unused (mostly) backquote character (`), which might serve this purpose (perhaps in combination with #):

string x = ` Some text with " characters " and multiple lines in it`

It might clutter (somehow) with operator definitions, but unlikely.

Or alternative (a-la Perl):

string x = #``__EOS this is a multiline "unquoted" string __EOS

This last one assuming that string quoted by backquotes may not be empty, of course.

The first syntax is good enough for REs as well, IMHO.

Regards, /Al

/ Brevbäraren

Alexander Demenshin

4:07 p.m.

On Sun, Sep 21, 2003 at 04:45:02PM +0200, Mirar @ Pike developers forum wrote:

...

...
`+(1,2);

Result: 3

...
string s=`+(1,2);`

Well, this might be a problem. Then we can use #' syntax:

string s1 = #`a "string"`; string s2 = #``__DELIM; // Newline here is not counted, spaces are ignored ... Newlines here are counted unless string is terminated with \ (as usual) __DELIM; // This must be at beginning of the line

So we are limited to the case that empty string may not be specified with ``.

...

...but you're positive to my ideas in general?

Oh yes... Very, very positive :) I really like this idea - it would be really convenient and useful. Especially in REs.

Regards, /Al

Mirar ＠ Pike developers forum

4:25 p.m.

#` is nice. But is there any better character that is even less common? Or, I think I'd like the final to be a tupel too, since that's easier to avoid.

How about #`...`#? But it kind of looks ugly... I guess [[...]] is out of the question?

For the multiline, I think I'd like a nice keyword, like #string. For instance #multiline,

string s2 = #multiline __DELIM ... __DELIM;

(note the semicolon; it must be there to complete the sentence.)

How easy is that to implement in the parser?

/ Mirar

Previous text:

...

2003-09-21 18:08: Subject: Re: wish: string with other quoting then \

On Sun, Sep 21, 2003 at 04:45:02PM +0200, Mirar @ Pike developers forum wrote:

...
...
`+(1,2);

Result: 3

...
string s=`+(1,2);`

Well, this might be a problem. Then we can use #' syntax:

string s1 = #`a "string"`; string s2 = #``__DELIM; // Newline here is not counted, spaces are ignored ... Newlines here are counted unless string is terminated with \ (as usual) __DELIM; // This must be at beginning of the line

So we are limited to the case that empty string may not be specified with ``.

...
...but you're positive to my ideas in general?

Oh yes... Very, very positive :) I really like this idea - it would be really convenient and useful. Especially in REs.

Regards, /Al

/ Brevbäraren

Alexander Demenshin

4:40 p.m.

On Sun, Sep 21, 2003 at 06:25:04PM +0200, Mirar @ Pike developers forum wrote:

...

#` is nice. But is there any better character that is even less common? Or, I think I'd like the final to be a tupel too, since that's easier to avoid.

Well... Tuple is also good enough, so why not #``...``? This way we can avoid quoting almost everywhere. Also, string with this syntax might be made multiline by default, i.e. those may contain newlines etc. Only problem that I see with backquotes is that (for instance) on German keyboard layout I've to press it twice to get one (same applies to Swiss and Austria I guess) :)

...

How about #`...`#? But it kind of looks ugly...

There is an option for #<...#> for instance, it might be made nested to simplify processing for Pike programs which generate Pike programs.

...

string s2 = #multiline __DELIM ... __DELIM;

Looks good for me. Very good, I'd say :)

...

(note the semicolon; it must be there to complete the sentence.)

This is reasonable.

...

How easy is that to implement in the parser?

I don't see any problems with parser, concerning this syntax. A bit of work, but nothing significant. Any parser gurus here? :)

Regards, /Al

Martin Stjernholm, Roxen IS ＠ Pike developers forum

4:45 p.m.

Almost any syntax is only a small matter of programming to implement in the preprocessor. What concerns me is more all the other tools that has to understand it too, as I talked about in the last discussion about this. It's far from easy to get into Emacs and XEmacs, for example.

There's also the issue that Peter Lundqvist mentioned: The added complexity of another string syntax is a drawback in itself when it comes to learning curve. Is it really sufficiently useful to outweigh that?

I'd say that at least 5% of all string literals would have to benefit from a new syntax to motivate its existence. I only missed something like it about ten or twenty times in all the years I've used Pike. So from my point of view the drawbacks outweigh the benefits with a factor of thousand at least. But it all depends on how you typically use the language, of course. Someone that writes complicated regexps or pastes in multiline xml snippets all day long will certainly have a different perspective.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-09-21 18:23: Subject: Re: wish: string with other quoting then \

#` is nice. But is there any better character that is even less common? Or, I think I'd like the final to be a tupel too, since that's easier to avoid.

How about #`...`#? But it kind of looks ugly... I guess [[...]] is out of the question?

For the multiline, I think I'd like a nice keyword, like #string. For instance #multiline,

string s2 = #multiline __DELIM ... __DELIM;

(note the semicolon; it must be there to complete the sentence.)

How easy is that to implement in the parser?

/ Mirar

Mirar ＠ Pike developers forum

4:55 p.m.

...

Almost any syntax is only a small matter of programming to implement in the preprocessor.

Even [[...]]?

...

What concerns me is more all the other tools that has to understand it too, as I talked about in the last discussion about this.

Yes, so it's important that the syntax is as simple as possible.

...

It's far from easy to get into Emacs and XEmacs, for example.

Another string syntax shouldn't be impossible, or can't it handle dual character string beginnings? The multiline is of course even trickier; how is that handled in for instance sh- or perl-mode?

...

complexity of another string syntax is a drawback in itself when it

...

I'd say that at least 5% of all string literals would have to benefit

I believe that the benefits outweights the cost to the learning curve. It might even help it, if the programs you see use a lot of the characters in strings that would otherwise needed to be quoted. And many newbies to Pike are used to other scripting languages, where you can do multiline strings with a custom delimiter. I think more then one user has actually searched for it in Pike.

I have a vague memory of a pike program that did some complex scripting via a shell; the top number of backslashes in a row was 16 in those strings, that is "\\\\\\\\".

I might be biased, I write a lot of scripts that generate HTML code, or even Pike code. I'd say in bytecount 50% would be quoted using either of these methods. The actual *string* count would of course be much lower.

Given funnier regexps I will probably start using more regexps in my programs too, so the count will increase...

/ Mirar

Previous text:

...

2003-09-21 18:44: Subject: Re: wish: string with other quoting then \

Almost any syntax is only a small matter of programming to implement in the preprocessor. What concerns me is more all the other tools that has to understand it too, as I talked about in the last discussion about this. It's far from easy to get into Emacs and XEmacs, for example.

There's also the issue that Peter Lundqvist mentioned: The added complexity of another string syntax is a drawback in itself when it comes to learning curve. Is it really sufficiently useful to outweigh that?

I'd say that at least 5% of all string literals would have to benefit from a new syntax to motivate its existence. I only missed something like it about ten or twenty times in all the years I've used Pike. So from my point of view the drawbacks outweigh the benefits with a factor of thousand at least. But it all depends on how you typically use the language, of course. Someone that writes complicated regexps or pastes in multiline xml snippets all day long will certainly have a different perspective.

/ Martin Stjernholm, Roxen IS

Martin Stjernholm, Roxen IS ＠ Pike developers forum

5:30 p.m.

...

Even [[...]]?

Hmm, no. That could be tricky to tell from a type cast at the beginning of an indexing operand.

...

Another string syntax shouldn't be impossible, or can't it handle dual character string beginnings?

You can say that e.g. both " and ' start string literals. What you can't do is to have different quoting rules inside them; if you say that \ is a quoting character it will do that in both.

Any syntax that has the properties you want will get tricky to support. There has to be a scanner that applies the correct rules and marks up the strings with syntactic text properties to override the defaults. The real problem is to deal with invalidation of those text properties quickly and accurately when the buffer changes.

...

The multiline is of course even trickier; how is that handled in for instance sh- or perl-mode?

Some quick testing shows that they don't, i.e. they get seriously confused if there's an unbalanced " in such a string literal.

...

And many newbies to Pike are used to other scripting languages, where you can do multiline strings with a custom delimiter. I think more then one user has actually searched for it in Pike.

I didn't think that was such a common feature. Perhaps a small survey is in order. What are the exact rules for this in other scripting languages then? Say Perl, Python, Tcl, VB, and Ruby?

Afaik none of the syntactically close languages C, C++, Objective-C and Java has anything better. C# has an alternative syntax which is introduced with @ in front of the string: They are called verbatim strings and \ doesn't have any special meaning in them. Any character except " is allowed. They can span multiple lines. " is quoted with "". E.g:

string s = @"Write: ""Hello"" to the C:\ drive.";

...

I have a vague memory of a pike program that did some complex scripting via a shell; the top number of backslashes in a row was 16 in those strings, that is "\\\\\\\\".

Yes, there are a few insane examples. However a different string syntax would only bring it down to 8 slashes in a row, which is still fairly insane.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-09-21 18:53: Subject: Re: wish: string with other quoting then \

...
Almost any syntax is only a small matter of programming to implement in the preprocessor.

Even [[...]]?

...
What concerns me is more all the other tools that has to understand it too, as I talked about in the last discussion about this.

Yes, so it's important that the syntax is as simple as possible.

...
It's far from easy to get into Emacs and XEmacs, for example.

Another string syntax shouldn't be impossible, or can't it handle dual character string beginnings? The multiline is of course even trickier; how is that handled in for instance sh- or perl-mode?

...

...
complexity of another string syntax is a drawback in itself when it

...

...
I'd say that at least 5% of all string literals would have to benefit

I believe that the benefits outweights the cost to the learning curve. It might even help it, if the programs you see use a lot of the characters in strings that would otherwise needed to be quoted. And many newbies to Pike are used to other scripting languages, where you can do multiline strings with a custom delimiter. I think more then one user has actually searched for it in Pike.

I have a vague memory of a pike program that did some complex scripting via a shell; the top number of backslashes in a row was 16 in those strings, that is "\\\\\\\\".

I might be biased, I write a lot of scripts that generate HTML code, or even Pike code. I'd say in bytecount 50% would be quoted using either of these methods. The actual *string* count would of course be much lower.

Given funnier regexps I will probably start using more regexps in my programs too, so the count will increase...

/ Mirar

David Hedbor ＠ Pike developers forum

22 Sep 22 Sep

7:15 p.m.

Any time you write XML or HTML verbatim you would want this syntax, especially multiline one. Count the number of these in say Caudium (or Roxen) and you'll see that it would be very nice to have. Personlly I really like the

string s = #multiline __blah blah blah blah __blah;

syntax. Also it's not just multiline xml that would benefit. single line XML would benefit from not having to quote " all the time.

/ David Hedbor

Previous text:

...

2003-09-21 18:44: Subject: Re: wish: string with other quoting then \

Almost any syntax is only a small matter of programming to implement in the preprocessor. What concerns me is more all the other tools that has to understand it too, as I talked about in the last discussion about this. It's far from easy to get into Emacs and XEmacs, for example.

There's also the issue that Peter Lundqvist mentioned: The added complexity of another string syntax is a drawback in itself when it comes to learning curve. Is it really sufficiently useful to outweigh that?

I'd say that at least 5% of all string literals would have to benefit from a new syntax to motivate its existence. I only missed something like it about ten or twenty times in all the years I've used Pike. So from my point of view the drawbacks outweigh the benefits with a factor of thousand at least. But it all depends on how you typically use the language, of course. Someone that writes complicated regexps or pastes in multiline xml snippets all day long will certainly have a different perspective.

/ Martin Stjernholm, Roxen IS

Jonas Walld�n ＠ Pike developers forum

9:35 p.m.

The last comment is one thing I often wonder about. Why do people write "<foo xyz="bla"/>" when "<foo xyz='bla'/>" works just as well? No need to quote at all.

I'd like to voice my opinion on the #multiline syntax, too. It's plain ugly! I'd rather have a syntax that Emacs can hilite properly than something which goes over the top just to avoid quoting. For me it's also not apparent whether leading and/or trailing whitespace is part of the string or not. For example, what would the interpretation of this be (note the space at the end of the first line)?

string s = #multiline __blah foo __blah;

Is it "foo", " \nfoo\n ", "foo\n " or perhaps illegal? The suggestion also doesn't handle non-printing characters (e.g. non-breaking space) or wide-string characters which normally are entered using quoting. It would be terribly frustrating to find a block of multiline text and not be able to add the characters I want without rewriting all of it!

/ Jonas Walldén

Previous text:

...

2003-09-22 21:12: Subject: Re: wish: string with other quoting then \

Any time you write XML or HTML verbatim you would want this syntax, especially multiline one. Count the number of these in say Caudium (or Roxen) and you'll see that it would be very nice to have. Personlly I really like the

string s = #multiline __blah blah blah blah __blah;

syntax. Also it's not just multiline xml that would benefit. single line XML would benefit from not having to quote " all the time.

/ David Hedbor

David Hedbor ＠ Pike developers forum

9:50 p.m.

...

The last comment is one thing I often wonder about. Why do people write "<foo xyz="bla"/>" when "<foo xyz='bla'/>" works just as well? No need to quote at all.

Change XML to HTML. Last I checked ' was not a valid HTML quoting character (but I might be incorrect or it might have changed).

...

For example, what would the interpretation of this be (note the space at the end of the first line)?

It would be a syntax error since there's no ending separator. It has to start first in the line. Also I'd think that the first and last newlines are thrown away.

...

string s = #multiline __blah foo __blah;

Is it "foo", " \nfoo\n ", "foo\n " or perhaps illegal? The suggestion also doesn't handle non-printing characters (e.g. non-breaking space) or wide-string characters which normally are entered using quoting. It would be terribly frustrating to find a block of multiline text and not be able to add the characters I want without rewriting all of it!

Just insert them verbatim, no quoting needed. Or that would be the idea at least (of course, non-breakable space would be impossible to see, so it might be quite annoying).

/ David Hedbor

Previous text:

...

2003-09-22 23:31: Subject: Re: wish: string with other quoting then \

The last comment is one thing I often wonder about. Why do people write "<foo xyz="bla"/>" when "<foo xyz='bla'/>" works just as well? No need to quote at all.

I'd like to voice my opinion on the #multiline syntax, too. It's plain ugly! I'd rather have a syntax that Emacs can hilite properly than something which goes over the top just to avoid quoting. For me it's also not apparent whether leading and/or trailing whitespace is part of the string or not. For example, what would the interpretation of this be (note the space at the end of the first line)?

string s = #multiline __blah foo __blah;

Is it "foo", " \nfoo\n ", "foo\n " or perhaps illegal? The suggestion also doesn't handle non-printing characters (e.g. non-breaking space) or wide-string characters which normally are entered using quoting. It would be terribly frustrating to find a block of multiline text and not be able to add the characters I want without rewriting all of it!

/ Jonas Walldén

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

10:20 p.m.

...

Change XML to HTML. Last I checked ' was not a valid HTML quoting character (but I might be incorrect or it might have changed).

Last time you checked must have been a veeery long time ago. HTML 2.0 did allow ' as an attribute quote. I can't find the 1.0 spec right now...

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Previous text:

...

2003-09-22 23:47: Subject: Re: wish: string with other quoting then \

...
The last comment is one thing I often wonder about. Why do people write "<foo xyz="bla"/>" when "<foo xyz='bla'/>" works just as well? No need to quote at all.

Change XML to HTML. Last I checked ' was not a valid HTML quoting character (but I might be incorrect or it might have changed).

...
For example, what would the interpretation of this be (note the space at the end of the first line)?

It would be a syntax error since there's no ending separator. It has to start first in the line. Also I'd think that the first and last newlines are thrown away.

...
string s = #multiline __blah foo __blah;

Is it "foo", " \nfoo\n ", "foo\n " or perhaps illegal? The suggestion also doesn't handle non-printing characters (e.g. non-breaking space) or wide-string characters which normally are entered using quoting. It would be terribly frustrating to find a block of multiline text and not be able to add the characters I want without rewriting all of it!

Just insert them verbatim, no quoting needed. Or that would be the idea at least (of course, non-breakable space would be impossible to see, so it might be quite annoying).

/ David Hedbor

Jonas Walld�n ＠ Pike developers forum

10:25 p.m.

...

Last I checked ' was not a valid HTML quoting character (but I might be incorrect or it might have changed).

Well, since HTML builds on SGML it's not clearly stated in the HTML specification, but I've seen several examples on the W3C site where ' is used to quote attribute values in HTML code. Don't know if it's been true since the first version, though, but in reality I've never had a problem with it.

...

Just insert them verbatim, no quoting needed.

Take the Euro character as an example. "\x20AC" is what I'd use today, but that wouldn't work. True wide-string doesn't work well with tools such as mail, cvs, diff etc so that's not an option either. UTF-8 escaping is even less of an option due to more non-printing characters.

/ Jonas Walldén

Previous text:

...

2003-09-22 23:47: Subject: Re: wish: string with other quoting then \

...
The last comment is one thing I often wonder about. Why do people write "<foo xyz="bla"/>" when "<foo xyz='bla'/>" works just as well? No need to quote at all.

Change XML to HTML. Last I checked ' was not a valid HTML quoting character (but I might be incorrect or it might have changed).

...
For example, what would the interpretation of this be (note the space at the end of the first line)?

It would be a syntax error since there's no ending separator. It has to start first in the line. Also I'd think that the first and last newlines are thrown away.

...
string s = #multiline __blah foo __blah;

Is it "foo", " \nfoo\n ", "foo\n " or perhaps illegal? The suggestion also doesn't handle non-printing characters (e.g. non-breaking space) or wide-string characters which normally are entered using quoting. It would be terribly frustrating to find a block of multiline text and not be able to add the characters I want without rewriting all of it!

Just insert them verbatim, no quoting needed. Or that would be the idea at least (of course, non-breakable space would be impossible to see, so it might be quite annoying).

/ David Hedbor

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

10:25 p.m.

...

Well, since HTML builds on SGML it's not clearly stated in the HTML specification,

HTML 2.0 _does_ cleary state that ' is valid. RFC 1866, section 3.2.4.

...

Take the Euro character as an example. "\x20AC" is what I'd use today, but that wouldn't work. True wide-string doesn't work well with tools such as mail, cvs, diff etc so that's not an option either. UTF-8 escaping is even less of an option due to more non-printing characters.

Just encode the source code as iso-8859-15. Works fine with tools such as mail, cvs, diff etc.

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Previous text:

...

2003-09-23 00:20: Subject: Re: wish: string with other quoting then \

...
Last I checked ' was not a valid HTML quoting character (but I might be incorrect or it might have changed).

Well, since HTML builds on SGML it's not clearly stated in the HTML specification, but I've seen several examples on the W3C site where ' is used to quote attribute values in HTML code. Don't know if it's been true since the first version, though, but in reality I've never had a problem with it.

...
Just insert them verbatim, no quoting needed.

Take the Euro character as an example. "\x20AC" is what I'd use today, but that wouldn't work. True wide-string doesn't work well with tools such as mail, cvs, diff etc so that's not an option either. UTF-8 escaping is even less of an option due to more non-printing characters.

/ Jonas Walldén

Jonas Walld�n ＠ Pike developers forum

10:40 p.m.

Tell that to the Windows computer I used last week that insisted Euro should be stored as 0x80... :-)

Anyway, this is off-topic but I'm curious to find out how diff would recognize which charset the file is using. If I have another string with the currency sign (0xA4 in ISO-8859-1) I'm quite certain it would report it as identical to Euro (0xA4 in ISO-8859-15). But for the sake of this argument I'll say "\x2122" (trademark) instead. (Gotcha! :-)

/ Jonas Walldén

Previous text:

...

2003-09-23 00:24: Subject: Re: wish: string with other quoting then \

...
Well, since HTML builds on SGML it's not clearly stated in the HTML specification,

HTML 2.0 _does_ cleary state that ' is valid. RFC 1866, section 3.2.4.

...
Take the Euro character as an example. "\x20AC" is what I'd use today, but that wouldn't work. True wide-string doesn't work well with tools such as mail, cvs, diff etc so that's not an option either. UTF-8 escaping is even less of an option due to more non-printing characters.

Just encode the source code as iso-8859-15. Works fine with tools such as mail, cvs, diff etc.

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

10:50 p.m.

...

Tell that to the Windows computer I used last week that insisted Euro should be stored as 0x80... :-)

"windows-1252" is also a valid character encoding for Pike source code, so you can use that if it helps. ;-)

...

Anyway, this is off-topic but I'm curious to find out how diff would recognize which charset the file is using. If I have another string with the currency sign (0xA4 in ISO-8859-1) I'm quite certain it would report it as identical to Euro (0xA4 in ISO-8859-15). But for the sake of this argument I'll say "\x2122" (trademark) instead. (Gotcha! :-)

If you change the encoding of the file from ISO-8859-1 to ISO-8859-15, diff will report a change on the #charset line. Your phrasing "another string" puzzles me a bit though. Are you talking about the diff function in Pike? It works on wide strings.

I'm not sure what the "gotcha" was concerning trademark, could you expand on that perhaps?

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Previous text:

...

2003-09-23 00:38: Subject: Re: wish: string with other quoting then \

Tell that to the Windows computer I used last week that insisted Euro should be stored as 0x80... :-)

Anyway, this is off-topic but I'm curious to find out how diff would recognize which charset the file is using. If I have another string with the currency sign (0xA4 in ISO-8859-1) I'm quite certain it would report it as identical to Euro (0xA4 in ISO-8859-15). But for the sake of this argument I'll say "\x2122" (trademark) instead. (Gotcha! :-)

/ Jonas Walldén

Jonas Walld�n ＠ Pike developers forum

11:05 p.m.

I'm talking about Unix diff command. Of course it would report a diff on the #charset line but that doesn't really help at all if there's a 0xA4 character on line 4711 in both versions since diff doesn't know its interpretation varies with the charset.

The gotcha was that you pointed to a ISO-8859-x encoding where Euro was included, but I now revised the example in 10728746 to use "\x2122" instead.

Regardless, I'd rather see a discussion on the #multiline syntax than this sub-topic. If you still don't understand the example I can live with that.

/ Jonas Walldén

Previous text:

...

2003-09-23 00:46: Subject: Re: wish: string with other quoting then \

...
Tell that to the Windows computer I used last week that insisted Euro should be stored as 0x80... :-)

"windows-1252" is also a valid character encoding for Pike source code, so you can use that if it helps. ;-)

...
Anyway, this is off-topic but I'm curious to find out how diff would recognize which charset the file is using. If I have another string with the currency sign (0xA4 in ISO-8859-1) I'm quite certain it would report it as identical to Euro (0xA4 in ISO-8859-15). But for the sake of this argument I'll say "\x2122" (trademark) instead. (Gotcha! :-)

If you change the encoding of the file from ISO-8859-1 to ISO-8859-15, diff will report a change on the #charset line. Your phrasing "another string" puzzles me a bit though. Are you talking about the diff function in Pike? It works on wide strings.

I'm not sure what the "gotcha" was concerning trademark, could you expand on that perhaps?

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

11:20 p.m.

...

I'm talking about Unix diff command. Of course it would report a diff on the #charset line but that doesn't really help at all if there's a 0xA4 character on line 4711 in both versions since diff doesn't know its interpretation varies with the charset.

Well, if you do

|#define EURO "\xa4" | |"blabla" EURO "blabla"

and then change the #define of EURO to "\x2122", diff will only report a change on the #define line, since diff doesn't know that the interpretation of EURO varies with the #define. Is this also a problem?

...

The gotcha was that you pointed to a ISO-8859-x encoding where Euro was included, but I now revised the example in 10728746 to use "\x2122" instead.

Then I can point you to e.g. iso-8859-supp instead. Or windows-1252 (already mentioned by yourself), macintosh (should be familiar to you :) etc.

...

Regardless, I'd rather see a discussion on the #multiline syntax than this sub-topic. If you still don't understand the example I can live with that.

If that is the case it is probably just because your examples have no connection whatsoever with reality. :-)

My view on #multiline syntax is as before: #%blabla blabla% where % can be picked semi-arbitraily and no escapes are recognized in blabla.

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Previous text:

...

2003-09-23 01:04: Subject: Re: wish: string with other quoting then \

I'm talking about Unix diff command. Of course it would report a diff on the #charset line but that doesn't really help at all if there's a 0xA4 character on line 4711 in both versions since diff doesn't know its interpretation varies with the charset.

The gotcha was that you pointed to a ISO-8859-x encoding where Euro was included, but I now revised the example in 10728746 to use "\x2122" instead.

Regardless, I'd rather see a discussion on the #multiline syntax than this sub-topic. If you still don't understand the example I can live with that.

/ Jonas Walldén

Jonas Walld�n ＠ Pike developers forum

23 Sep 23 Sep

12:10 a.m.

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

12:40 p.m.

It was pretty clear that your examples weren't from your reality either, but simply made up as you went along.

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Previous text:

...

2003-09-23 02:09: Subject: Re: wish: string with other quoting then \

From time to time I feel it's a healthy sign that my reality doesn't match with yours...

/ Jonas Walldén

David Hedbor ＠ Pike developers forum

12:05 a.m.

Ok, good. As I said, I haven't looked at this for a long time and even then didn't ever go very deep.

/ David Hedbor

Previous text:

...

2003-09-23 00:24: Subject: Re: wish: string with other quoting then \

...
Well, since HTML builds on SGML it's not clearly stated in the HTML specification,

HTML 2.0 _does_ cleary state that ' is valid. RFC 1866, section 3.2.4.

...
Take the Euro character as an example. "\x20AC" is what I'd use today, but that wouldn't work. True wide-string doesn't work well with tools such as mail, cvs, diff etc so that's not an option either. UTF-8 escaping is even less of an option due to more non-printing characters.

Just encode the source code as iso-8859-15. Works fine with tools such as mail, cvs, diff etc.

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Martin Baehr

22 Sep 22 Sep

9:58 p.m.

On Mon, Sep 22, 2003 at 11:35:02PM +0200, Jonas Walldén @ Pike developers forum wrote:

...

The last comment is one thing I often wonder about. Why do people write "<foo xyz="bla"/>" when "<foo xyz='bla'/>" works just as well?

that breaks as soon as you need to nest it: <xsl:if test="foo = 'bar'"/>

greetings, martin.

Jonas Walld�n ＠ Pike developers forum

10:05 p.m.

Of course, but in normal use it works just fine.

/ Jonas Walldén

Previous text:

...

2003-09-22 23:58: Subject: Re: wish: string with other quoting then \

On Mon, Sep 22, 2003 at 11:35:02PM +0200, Jonas Walldén @ Pike developers forum wrote:

...
The last comment is one thing I often wonder about. Why do people write "<foo xyz="bla"/>" when "<foo xyz='bla'/>" works just as well?

that breaks as soon as you need to nest it: <xsl:if test="foo = 'bar'"/>

greetings, martin.

/ Brevbäraren

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

21 Sep 21 Sep

8:45 p.m.

I think that _if_ we should introduce yet another string syntax, then rather than trying to figure out the quote character to end all quote character, we should let the user pick an arbitrary quote character. Think \verb in TeX. In fact, #" could probably be extended in this fashion, as long as we restrict the arbitrariness of the quote character to not be alphanumeric, so as to avoid conflicts with other preprocessor(-like) directives. Of course, whitespace should not be allowed either.

That is:

#`foo`

would give you the backquote syntax,

#|weird, ain`t it|

would give you a different one where you can use backquotes, etc etc.

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Previous text:

...

2003-09-21 18:23: Subject: Re: wish: string with other quoting then \

#` is nice. But is there any better character that is even less common? Or, I think I'd like the final to be a tupel too, since that's easier to avoid.

How about #`...`#? But it kind of looks ugly... I guess [[...]] is out of the question?

For the multiline, I think I'd like a nice keyword, like #string. For instance #multiline,

string s2 = #multiline __DELIM ... __DELIM;

(note the semicolon; it must be there to complete the sentence.)

How easy is that to implement in the parser?

/ Mirar

Mirar ＠ Pike developers forum

8:50 p.m.

Nice idea. But can it be extended so you can choose the escape character () too?

/ Mirar

Previous text:

...

2003-09-21 22:44: Subject: Re: wish: string with other quoting then \

I think that _if_ we should introduce yet another string syntax, then rather than trying to figure out the quote character to end all quote character, we should let the user pick an arbitrary quote character. Think \verb in TeX. In fact, #" could probably be extended in this fashion, as long as we restrict the arbitrariness of the quote character to not be alphanumeric, so as to avoid conflicts with other preprocessor(-like) directives. Of course, whitespace should not be allowed either.

That is:

#`foo`

would give you the backquote syntax,

#|weird, ain`t it|

would give you a different one where you can use backquotes, etc etc.

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

8:55 p.m.

The idea with \verb is that if you can select the quote arbitrarily, you don't _need_ an escape character, since you can put everything in unescaped as long as you pick a quote character which is not in the text.

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Previous text:

...

2003-09-21 22:46: Subject: Re: wish: string with other quoting then \

Nice idea. But can it be extended so you can choose the escape character () too?

/ Mirar

Mirar ＠ Pike developers forum

8:55 p.m.

I can accept that, but some people like typing in their character using escape codes like \1234... I don't know how many programs would break if we removed the escape possibility from #"..."?

/ Mirar

Previous text:

...

2003-09-21 22:52: Subject: Re: wish: string with other quoting then \

The idea with \verb is that if you can select the quote arbitrarily, you don't _need_ an escape character, since you can put everything in unescaped as long as you pick a quote character which is not in the text.

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

9 p.m.

People who like to use escapes can put them in normal "":s, the whole purpose for a new syntax would be to remove the need for escapes completely. But as I wroute in the footnote, backwards compatibility for the particular case #" would of course have to be maintainted.

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Previous text:

...

2003-09-21 22:53: Subject: Re: wish: string with other quoting then \

I can accept that, but some people like typing in their character using escape codes like \1234... I don't know how many programs would break if we removed the escape possibility from #"..."?

/ Mirar

Mirar ＠ Pike developers forum

9 p.m.

Yes, that's an idea.

So which characters after # can we allow for this? The whole non-alphabetical range, including space?

But numerical and ! is used already too, isn't it?

/ Mirar

Previous text:

...

2003-09-21 22:55: Subject: Re: wish: string with other quoting then \

People who like to use escapes can put them in normal "":s, the whole purpose for a new syntax would be to remove the need for escapes completely. But as I wroute in the footnote, backwards compatibility for the particular case #" would of course have to be maintainted.

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum

9:05 p.m.

I think that whitespace and alphanumeric should be disallowed. Not only would whitespace be confusing to use in this context, it would also mean that

# ifdef FOO

can't work, and we don't want that.

Otherwise, anything goes. In particular, there is an ample supply of Unicode characters to pick from. :-)

#! is special at the beginning of the first line, but having a string literal there would be a syntax error anyway, so I don't think overloading the syntax would lead to any particular problems.

/ Marcus Comstedt (ACROSS) (Hail Ilpalazzo!)

Previous text:

...

2003-09-21 22:57: Subject: Re: wish: string with other quoting then \

Yes, that's an idea.

So which characters after # can we allow for this? The whole non-alphabetical range, including space?

But numerical and ! is used already too, isn't it?

/ Mirar

Alexander Demenshin

9:17 p.m.

On Sun, Sep 21, 2003 at 11:05:06PM +0200, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:

...

#! is special at the beginning of the first line, but having a string

Not at all. Just checked. It works exactly like // anywhere (at least in 7.4). May be this isn't intentional, but... :)

Regards, /Al

Martin Nilsson (saturator) ＠ Pike (-) developers forum

9:20 p.m.

It appears very intentional (preprocessor.h).

/ Martin Nilsson (saturator)

Previous text:

...

2003-09-21 23:18: Subject: Re: wish: string with other quoting then \

On Sun, Sep 21, 2003 at 11:05:06PM +0200, Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:

...
#! is special at the beginning of the first line, but having a string

Not at all. Just checked. It works exactly like // anywhere (at least in 7.4). May be this isn't intentional, but... :)

Regards, /Al

/ Brevbäraren

Alexander Demenshin

9:16 p.m.

On Sun, Sep 21, 2003 at 11:00:02PM +0200, Mirar @ Pike developers forum wrote:

...

So which characters after # can we allow for this? The whole non-alphabetical range, including space?

I would omit space from this set. Anything printable is sufficient, IMHO :)

Perhaps another possibility is to allow tuples - i.e. if special character is doubled immediately after this quote spec it must be doubled to end it (this will help in case when somebody can't choose good enough quoting character, i.e. the string contains all special characters).

I hope nobody will really want to specify empty strings this way, and this will be easily dectected by the compiler anyway :)

#/some \unescaped and "quoted" chars/ #//some text with / in it//

Kind of... :)

Regards, /Al

Martin Nilsson (saturator) ＠ Pike (-) developers forum

9:25 p.m.

...

#/some \unescaped and "quoted" chars/

Is the the same as "ome \unescaped and "quoted" char"?

/ Martin Nilsson (saturator)

Previous text:

...

2003-09-21 23:17: Subject: Re: wish: string with other quoting then \

On Sun, Sep 21, 2003 at 11:00:02PM +0200, Mirar @ Pike developers forum wrote:

...
So which characters after # can we allow for this? The whole non-alphabetical range, including space?

I would omit space from this set. Anything printable is sufficient, IMHO :)

Perhaps another possibility is to allow tuples - i.e. if special character is doubled immediately after this quote spec it must be doubled to end it (this will help in case when somebody can't choose good enough quoting character, i.e. the string contains all special characters).

I hope nobody will really want to specify empty strings this way, and this will be easily dectected by the compiler anyway :)

#/some \unescaped and "quoted" chars/ #//some text with / in it//

Kind of... :)

Regards, /Al

/ Brevbäraren

Mirar ＠ Pike developers forum

9:30 p.m.

Yes, why not...

...

...
#/some \unescaped and "quoted" chars/

Is the the same as "ome \unescaped and "quoted" char"?

No, 's' is not a repetition of '/'...

/ Mirar

Previous text:

...

2003-09-21 23:17: Subject: Re: wish: string with other quoting then \

On Sun, Sep 21, 2003 at 11:00:02PM +0200, Mirar @ Pike developers forum wrote:

...
So which characters after # can we allow for this? The whole non-alphabetical range, including space?

I would omit space from this set. Anything printable is sufficient, IMHO :)

Perhaps another possibility is to allow tuples - i.e. if special character is doubled immediately after this quote spec it must be doubled to end it (this will help in case when somebody can't choose good enough quoting character, i.e. the string contains all special characters).

I hope nobody will really want to specify empty strings this way, and this will be easily dectected by the compiler anyway :)

#/some \unescaped and "quoted" chars/ #//some text with / in it//

Kind of... :)

Regards, /Al

/ Brevbäraren

Martin Nilsson (saturator) ＠ Pike (-) developers forum

9:25 p.m.

My vote is on all non-alphanumerical, non-whitespace characters except exclamation mark.

/ Martin Nilsson (saturator)

Previous text:

...

2003-09-21 22:57: Subject: Re: wish: string with other quoting then \

Yes, that's an idea.

So which characters after # can we allow for this? The whole non-alphabetical range, including space?

But numerical and ! is used already too, isn't it?

/ Mirar

Mirar ＠ Pike developers forum

9:30 p.m.

Yes, then '\1' aka ctrl-a for instance could be used if you need to have a string with all printable characters. :)

/ Mirar

Previous text:

...

2003-09-21 23:23: Subject: Re: wish: string with other quoting then \

My vote is on all non-alphanumerical, non-whitespace characters except exclamation mark.

/ Martin Nilsson (saturator)

Martin Nilsson (saturator) ＠ Pike (-) developers forum

9:35 p.m.

Well, currently it is an illegal character so I don't think there is much of an compatibility issue.

/ Martin Nilsson (saturator)

Previous text:

...

2003-09-21 23:28: Subject: Re: wish: string with other quoting then \

Yes, then '\1' aka ctrl-a for instance could be used if you need to have a string with all printable characters. :)

/ Mirar

Peter Lundqvist (disjunkt) ＠ Pike (-) developers forum

4:35 p.m.

Please feel free to ignore my ramblings.

Why do this?

Learning proper regexp quoting/syntax seems to be, at least for most people, a real pain. However, quoting is the same everywhere, so once learned it's mostly the same in most languages. Why deviate from this?

IMHO the number one compelling reason for using pike is that it is a lot like most languages - only better. I fear that this is not a verry good idea (apart from better multi line string support) as it sets it apart in a verry touch area - string manipulation.

I do not think that weird string quoting is what is going to get pike world domination. Too radical differences scare people away. For instance - the most common complaint about python I've ever heard from people new to the language is the string indexing (which quite frankly is just plain dumb IMNSHO).

/ Peter Lundqvist (disjunkt)

Previous text:

...

2003-09-21 18:08: Subject: Re: wish: string with other quoting then \

On Sun, Sep 21, 2003 at 04:45:02PM +0200, Mirar @ Pike developers forum wrote:

...
...
`+(1,2);

Result: 3

...
string s=`+(1,2);`

Well, this might be a problem. Then we can use #' syntax:

string s1 = #`a "string"`; string s2 = #``__DELIM; // Newline here is not counted, spaces are ignored ... Newlines here are counted unless string is terminated with \ (as usual) __DELIM; // This must be at beginning of the line

So we are limited to the case that empty string may not be specified with ``.

...
...but you're positive to my ideas in general?

Oh yes... Very, very positive :) I really like this idea - it would be really convenient and useful. Especially in REs.

Regards, /Al

/ Brevbäraren

Alexander Demenshin

4:49 p.m.

On Sun, Sep 21, 2003 at 06:35:01PM +0200, Peter Lundqvist (disjunkt) @ Pike (-) developers forum wrote:

...

Learning proper regexp quoting/syntax seems to be, at least for most people, a real pain. However, quoting is the same everywhere, so once learned it's mostly the same in most languages. Why deviate from this?

Since new syntax (whatever it might be) is an _addition_ and is fully backward-compatible, it will be used only by those who want it. It won't interfere with existing, traditional syntax. So people are free to choose - "good old" syntaxt, or "better new" :)

In case of REs, strings with double quoting looks ugly - just compare

"\([a-z]\+\)" and (say) #`([a-z]+)` - the first is a bit... hmm... needs more time to count back-slashes to understand what is going on :)

There is alternative like ##/re../ where "/" might be anything (like in Perl) and used as delimiter. Tuple ## isn't used anywhere, or?

Regards, /Al

Martin Stjernholm, Roxen IS ＠ Pike developers forum

5 p.m.

Fortunately the regexp syntax isn't Emacs-style, so the regexp is "([a-z]+)" in either case. That's actually a very large factor in limiting the "leaning toothpick syndrome" to bearable levels.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-09-21 18:50: Subject: Re: wish: string with other quoting then \

On Sun, Sep 21, 2003 at 06:35:01PM +0200, Peter Lundqvist (disjunkt) @ Pike (-) developers forum wrote:

...
Learning proper regexp quoting/syntax seems to be, at least for most people, a real pain. However, quoting is the same everywhere, so once learned it's mostly the same in most languages. Why deviate from this?

Since new syntax (whatever it might be) is an _addition_ and is fully backward-compatible, it will be used only by those who want it. It won't interfere with existing, traditional syntax. So people are free to choose - "good old" syntaxt, or "better new" :)

In case of REs, strings with double quoting looks ugly - just compare

"\([a-z]\+\)" and (say) #`([a-z]+)` - the first is a bit... hmm... needs more time to count back-slashes to understand what is going on :)

There is alternative like ##/re../ where "/" might be anything (like in Perl) and used as delimiter. Tuple ## isn't used anywhere, or?

Regards, /Al

/ Brevbäraren

Alexander Demenshin

5:53 p.m.

On Sun, Sep 21, 2003 at 07:00:01PM +0200, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:

...

Fortunately the regexp syntax isn't Emacs-style, so the regexp is "([a-z]+)" in either case. That's actually a very large factor in

Exactly my point. In my example I meant literal "(", ")" and "+" :))

The quoting of literals in modern REs is sometimes a bit confusing in regular (C style) strings....

Regards, /Al

Martin Stjernholm, Roxen IS ＠ Pike developers forum

6:05 p.m.

A decent glue to a regexp engine ought to include a function to quote a string to a regexp that only matches that string, i.e. like str() in my earlier message. (A decent regexp engine ought to allow entering string literals directly without the conversion to and from regexp syntax.) A function like that is useful regardless of string and regexp quoting rules.

/ Martin Stjernholm, Roxen IS

Previous text:

...

2003-09-21 19:54: Subject: Re: wish: string with other quoting then \

On Sun, Sep 21, 2003 at 07:00:01PM +0200, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:

...
Fortunately the regexp syntax isn't Emacs-style, so the regexp is "([a-z]+)" in either case. That's actually a very large factor in

Exactly my point. In my example I meant literal "(", ")" and "+" :))

The quoting of literals in modern REs is sometimes a bit confusing in regular (C style) strings....

Regards, /Al

/ Brevbäraren

Alexander Demenshin

6:18 p.m.

On Sun, Sep 21, 2003 at 08:05:01PM +0200, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:

...

A decent glue to a regexp engine ought to include a function to quote a string to a regexp that only matches that string, i.e. like str() in

This still doesn't help too much when quoting is necessary. Say, I might have too complex RE (to match mail header, for instance), where I want to use special characters and their literals as well. Double quoting is never good idea, wherever it is, IMHO.

Regards, /Al

Per Hedbor () ＠ Pike (-) developers forum

6:30 p.m.

In what way won't it help?

/ Per Hedbor ()

Previous text:

...

2003-09-21 20:19: Subject: Re: wish: string with other quoting then \

On Sun, Sep 21, 2003 at 08:05:01PM +0200, Martin Stjernholm, Roxen IS @ Pike developers forum wrote:

...
A decent glue to a regexp engine ought to include a function to quote a string to a regexp that only matches that string, i.e. like str() in

This still doesn't help too much when quoting is necessary. Say, I might have too complex RE (to match mail header, for instance), where I want to use special characters and their literals as well. Double quoting is never good idea, wherever it is, IMHO.

Regards, /Al

/ Brevbäraren

Alexander Demenshin

6:37 p.m.

On Sun, Sep 21, 2003 at 08:30:02PM +0200, Per Hedbor () @ Pike (-) developers forum wrote:

...

In what way won't it help?

If I understood correctly, we are talking about function which will accept any string as RE where all characters are literals. The question is, how can I still use special characters, in case when I need _both_?

Regards, /Al

Peter Lundqvist (disjunkt) ＠ Pike (-) developers forum

5:05 p.m.

Oh - seems like I got it all backwards. Sorry for that.

Hmm. What about special characters lie \n, \t, \r, \f et.c. How would these be composed using this quoting scheeme?

/ Peter Lundqvist (disjunkt)

Previous text:

...

2003-09-21 18:50: Subject: Re: wish: string with other quoting then \

On Sun, Sep 21, 2003 at 06:35:01PM +0200, Peter Lundqvist (disjunkt) @ Pike (-) developers forum wrote:

...
Learning proper regexp quoting/syntax seems to be, at least for most people, a real pain. However, quoting is the same everywhere, so once learned it's mostly the same in most languages. Why deviate from this?

Since new syntax (whatever it might be) is an _addition_ and is fully backward-compatible, it will be used only by those who want it. It won't interfere with existing, traditional syntax. So people are free to choose - "good old" syntaxt, or "better new" :)

In case of REs, strings with double quoting looks ugly - just compare

"\([a-z]\+\)" and (say) #`([a-z]+)` - the first is a bit... hmm... needs more time to count back-slashes to understand what is going on :)

There is alternative like ##/re../ where "/" might be anything (like in Perl) and used as delimiter. Tuple ## isn't used anywhere, or?

Regards, /Al

/ Brevbäraren

7970

Age (days ago)

7972

Last active (days ago)

pike-devel@lists.lysator.liu.se

47 comments

10 participants

tags (0)

participants (10)

Alexander Demenshin
David Hedbor ＠ Pike developers forum
Jonas Walld�n ＠ Pike developers forum
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) ＠ Pike (-) developers forum
Martin Baehr
Martin Nilsson (saturator) ＠ Pike (-) developers forum
Martin Stjernholm, Roxen IS ＠ Pike developers forum
Mirar ＠ Pike developers forum
Per Hedbor () ＠ Pike (-) developers forum
Peter Lundqvist (disjunkt) ＠ Pike (-) developers forum