Re: Problem with national characters in XHTML

From: Tomas Zerolo
Subject: Re: Problem with national characters in XHTML
Date: Wed, 28 Sep 2005 12:41:09 +0200
On Wed, Sep 28, 2005 at 10:29:21AM +0200, LENNART BORGMAN wrote:
> I have run into a problem with swedish national characters in an XHTML 
> document. The header of the document is like this:
>   <?xml version="1.0" encoding="utf-8"?>
>   <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
>             "http://www.w3.org/TR/REC-html40/loose.dtd";>
>   <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en">

Hm. Note that the header says of itself that it's encoded in utf-8. I
don't know whether it's relevant.

> The swedish character ä looks like \344 in CVS Emacs (2005-09-23).

If Emacs honors the header above, then this won't work: Octal 344 is an
a-with-dieresis, but in iso 8859-1 encoding, not utf-8.

> It looks ok in Internet Explorer, but not in Firefox.

I'd say Firefox is right on this one ;-)

Seriously: you can force the browser to assume an encoding, so what the
browser shows depends on settings which may vary from time to time. On
Firefox, it's under View -> Character Encoding. No idea about IE (and
I'm glad not to know ;-).

>                                                       Looking at the
> file with Notepad also shows the swedish characters as expected.

Notepad uses whatever encoding its font has; i guess an 8-bit fixed

> I would be glad for some hints and pointers! I am using nxml-mode if
> that matters here.

You may try two things: changing the utf-8 in the header to iso-8859-1
or (better) insert your a-dieresis as an utf8-encoded char.

-- tomás

