emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 23.0.60; [nxml] BOM and utf-8


From: tomas
Subject: Re: 23.0.60; [nxml] BOM and utf-8
Date: Fri, 23 May 2008 11:05:11 +0200
User-agent: Mutt/1.5.15+20070412 (2007-04-11)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, May 23, 2008 at 02:34:46AM +0900, Stephen J. Turnbull wrote:
> address@hidden writes:
> 
>  > > > ,----[ http://www.w3.org/TR/2006/REC-xml-20060816/#charencoding ]
>  > > > | Entities encoded in UTF-16 MUST and entities encoded in UTF-8 MAY
>  > > > | begin with the Byte Order Mark [...]
>  > > > |        [...]  XML processors MUST be able to use this character to
>  > > > | differentiate between UTF-8 and UTF-16 encoded documents.
>  > > > `----
>  > 
>  > ...and how are the XML processors supposed to achieve that? Is there a
>  > second variant of BOM, indicating UTF-8?
> 
> Well, note that the BOM is three octets in UTF-8 but only two in
> UTF-16.  Dead giveaway, there.

Duh. Thanks. That's what I was missing.

[...]

>  > Am I completely whacko, or are they?
> 
> Neither.  You live in a relatively sane world, they live in a world
> which contains the sovereign nations of Japan and Microsoft.

Thanks for your kind words :-)

As for whether Emacs or nxml has the burden of skipping the BOM -- that
would correspond to whether nxml "within" Emacs is "seeing" a piece of
XML or a whole XML document, right?

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFINojHBcgs9XrR2kYRAtieAJsEoakhvgRrjisQ9XhIjAap5mISBACaAjrk
7IuDZQjZvvdFoadb90lSygE=
=022X
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]