Sorry, I had an error in my "it will now return" mapping.
Actual example:
p->finish("<t a='b''c'"d"e>")->read();
([ /* 3 elements */ "a": "b", "c": "c", "d": "d" "e": "e" ])
Do I understand correctly that the old parser allows quoted attributes and multiple strings in attribute values, but does not allow multiple attributes without whitespace in between or multiple strings in attribute values _with_ whitespace in between?
So
<t a="foo" "bar">
would have two attributes "a" and "bar",
and
<t a="foo""bar">
would have one attribute "a"?
Neither of these is valid HTML, so there is no particular reason to change the interpretation of either.
<t a="foo"b="bar">
does not have two quoted strings after each other, so there is no need to consider it for concatentation, and it can be interpreted as two attributes "a" and "b".
Is there an example of something which is valid HTML _and_ which would (for good reason) be interpreted as something else by the old parser?
Do I understand correctly that the old parser allows quoted attributes and multiple strings in attribute values, but does not allow multiple attributes without whitespace in between or multiple strings in attribute values _with_ whitespace in between?
Yes. (it also allows quoted and multiple name-parts in a attribute name if they do not contain spaces, eg:
'foo'bar=ga'zo'nk <-> foobar=gazonk
So
<t a="foo" "bar">
would have two attributes "a" and "bar",
Yes
and
<t a="foo""bar">
would have one attribute "a"?
Indeed (fun, right?).
Neither of these is valid HTML, so there is no particular reason to change the interpretation of either.
<t a="foo"b="bar">
does not have two quoted strings after each other, so there is no need to consider it for concatentation, and it can be interpreted as two attributes "a" and "b".
The strings do not have to be quoted to be considered for aggregation. Any string that does not contain non-quoted whitespaces will do.
So:
a="foo"bar='gazonk' would define the attribute 'a' with the value "foobar=gazonk" (this was one of the actually tested cases).
Is there an example of something which is valid HTML _and_ which would (for good reason) be interpreted as something else by the old parser?
Well, the one above. And, "for good reason" is relative, I guess.
I think the intended usecase was this:
foo="'"bar"'" to set the attribute "foo" to "'bar'"
The strings do not have to be quoted to be considered for aggregation. Any string that does not contain non-quoted whitespaces will do.
But is there any _point_ in aggregating a quoted string with a non-quoted one? And, again, was such behaviour ever documented?
a="foo"bar='gazonk' would define the attribute 'a' with the value "foobar=gazonk" (this was one of the actually tested cases).
But there is no incentive for _not_ writing this as
a="foobar=gazonk".
It just doesn't make any sense.
Is there an example of something which is valid HTML _and_ which would (for good reason) be interpreted as something else by the old parser?
Well, the one above. And, "for good reason" is relative, I guess.
If you present the reason why anyone would write that, then I might be able to judge whether it is "good" or not...
I think the intended usecase was this:
foo="'"bar"'" to set the attribute "foo" to "'bar'"
^^^^^^^ But you just said how that is written shorter and less convoluted!
Is there an example of something which is valid HTML _and_ which would (for good reason) be interpreted as something else by the old parser?
Well, the one above. And, "for good reason" is relative, I guess.
If you present the reason why anyone would write that, then I might be able to judge whether it is "good" or not...
I very much do /not/ want to keep the current interpretation. It would just requite a significantly more extensive rewrite of the parser to do it correctly.
foo="'"bar"'" to set the attribute "foo" to "'bar'"
^^^^^^^
But you just said how that is written shorter and less convoluted!
Indeed.
The actual example in the testsuite was more along the lines of
foo='"This is a quotes string. it'"'"'s using both types of quotes"'
-> "This is a quoted string. It's using both types of quotes"
I would prefer using entities, myself. And I think that has always been the prefered method.
foo=""e;This is a quotes string. it's using both types of quotes"e;'
And I really have no idea what the Parser.HTML documentation says about it all, I do not think it actually mentions how things are parsed. But the code goes to great lengths to actually support the concatenation of strings, so it must have been intentional.
The actual example in the testsuite was more along the lines of
foo='"This is a quotes string. it'"'"'s using both types of quotes"'
-> "This is a quoted string. It's using both types of quotes"
Which is fine. That example does not aggregate quoted values with non-quoted ones.
For any potential aggregation of a quoted string "XXX" and a non-quoted string YYY, you can always write "XXXYYY" instead, until only quoted strings remain.
And I really have no idea what the Parser.HTML documentation says about it all, I do not think it actually mentions how things are parsed. But the code goes to great lengths to actually support the concatenation of strings, so it must have been intentional.
It does not follow that concatenation of quoted and non-quoted strings specifically was desired though.
And if the behaviour is not documented, it is undefined and could be changed at any time. ;-)
And if the behaviour is not documented, it is undefined and could be changed at any time. ;-)
No. That is not how it works. If the current behaviour might be in use compatibility is not something that should be dropped without consideration. _I_ think it's ugly as hell and should be dropped if someone wants to touch the code, but then I'm not a user of that feature.
I think it was just following the same quotation methods allowed in scripts. I'm not sure if any browser actually parsed the entities like that, but they used to be way too nice on how they allowed things to be written...
pike-devel@lists.lysator.liu.se