Hi Niels--
i notice that https://www.lysator.liu.se/~nisse/nettle/nettle.html is served with the HTTP header:
Content-Type: text/html; charset=iso-8859-1
but it contains non-ASCII text -- your name "Niels Möller", but it is rendered as Niels Möller due to the charset parameter.
You can update your nginx config by using the charset directive:
charset UTF-8;
see https://nginx.org/en/docs/http/ngx_http_charset_module.html#charset for more detail.
Regards,
--dkg
Daniel Kahn Gillmor dkg@fifthhorseman.net writes:
i notice that https://www.lysator.liu.se/~nisse/nettle/nettle.html is served with the HTTP header:
Content-Type: text/html; charset=iso-8859-1
but it contains non-ASCII text -- your name "Niels Möller", but it is rendered as Niels Möller due to the charset parameter.
It looks equally bad for me (in firefox).
You can update your nginx config by using the charset directive:
charset UTF-8;
I can ask the people maintaining this webserver. I think the nginx only acts as a reverse-proxy for an apache or possibly roxen server behind it.
Other html files under https://www.lysator.liu.se/~nisse carries both a
<?xml version="1.0" encoding="utf-8"?>
and a
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
inside <head>...</head>, which is supposed to override whatever the actual http headers say. And that seems to work.
But nettle.html is generated with makeinfo and looks slightly different. The nettle.texinfo file includes
@documentencoding UTF-8
and the generated nettle.html carries a
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
but it seems that isn't enough. Any clues appreciated.
Regards, /Niels
On Wed 2019-10-23 20:14:15 +0200, Niels Möller wrote:
But nettle.html is generated with makeinfo and looks slightly different. The nettle.texinfo file includes
@documentencoding UTF-8
and the generated nettle.html carries a
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
but it seems that isn't enough. Any clues appreciated.
Alas, i'm out of my depth here :(
fwiw, the one thing that seems to be wrong is the nginx config (or the backing webserver, if nginx is just proxying, though you can also use override_charset [0] if you can't fix the backing webserver)
Typically i'd try to fix the thing that's broken rather than trying override it in some lower level if possible.
I think asking the webserver operator is the right thing to do.
Sorry to not have any better suggestions,
--dkg
[0] https://nginx.org/en/docs/http/ngx_http_charset_module.html#override_charset
On Wed, Oct 23, 2019 at 08:14:15PM +0200, Niels Möller wrote:
Daniel Kahn Gillmor dkg@fifthhorseman.net writes:
i notice that https://www.lysator.liu.se/~nisse/nettle/nettle.html is served with the HTTP header:
Content-Type: text/html; charset=iso-8859-1
but it contains non-ASCII text -- your name "Niels Möller", but it is rendered as Niels Möller due to the charset parameter.
It looks equally bad for me (in firefox).
You can update your nginx config by using the charset directive:
charset UTF-8;
I can ask the people maintaining this webserver. I think the nginx only acts as a reverse-proxy for an apache or possibly roxen server behind it.
Other html files under https://www.lysator.liu.se/~nisse carries both a
<?xml version="1.0" encoding="utf-8"?>
and a
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
inside <head>...</head>, which is supposed to override whatever the actual http headers say. And that seems to work.
But nettle.html is generated with makeinfo and looks slightly different. The nettle.texinfo file includes
@documentencoding UTF-8
and the generated nettle.html carries a
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
but it seems that isn't enough. Any clues appreciated.
The HTTP headers have priority over HTML http-equiv <meta> tag.
Based on the other pages, the <xml> encodings seems to have priority over both.
Regards, Daniel
On Wed, Oct 23, 2019 at 08:14:15PM +0200, Niels Möller wrote:
inside <head>...</head>, which is supposed to override whatever the actual http headers say. And that seems to work.
That was my understanding as well but I had almost this problem a week or so ago and found that (in modern browsers at least) the HTTP header takes precedence. Web documentation currently[1] says that the meta tag only overrides the charset if the header does not have a charset parameter at all.
[1] The HTML Living Standard is an utter mess of broken behavior, and I feel bad using it as a reference, but it seems unambiguous that the presence of a charset parameter in the HTTP header prevents re-scanning the document if a meta tag specifies a different charset: https://html.spec.whatwg.org/multipage/parsing.html#determining-the-characte...
On 10/24/19 11:47 AM, Wim Lewis wrote:
On Wed, Oct 23, 2019 at 08:14:15PM +0200, Niels Möller wrote:
inside <head>...</head>, which is supposed to override whatever the actual http headers say. And that seems to work.
That was my understanding as well but I had almost this problem a week or so ago and found that (in modern browsers at least) the HTTP header takes precedence. Web documentation currently[1] says that the meta tag only overrides the charset if the header does not have a charset parameter at all.
[1] The HTML Living Standard is an utter mess of broken behavior, and I feel bad using it as a reference, but it seems unambiguous that the presence of a charset parameter in the HTTP header prevents re-scanning the document if a meta tag specifies a different charset: https://html.spec.whatwg.org/multipage/parsing.html#determining-the-characte...
Thanks, Wim.
Niels, considering this, you could perhaps save the HTML document as iso-8859-1 and change the meta tag to contain "charset=iso-8859-1".
That works as long as you don't use characters outside that charset.
Best solution would be to contact the server admin to remove that charset info from the Content-Type header. It's a (legacy) bug anyways.
Regards, Tim
nettle-bugs@lists.lysator.liu.se