emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

23.0.60; [nxml] BOM and utf-8


From: Stephen J. Turnbull
Subject: 23.0.60; [nxml] BOM and utf-8
Date: Sun, 18 May 2008 11:29:41 +0900

Patrick Drechsler writes:

 > is the attached xml file (simple.xml) really invalid (as indicated by
 > nxhtml) or is this a bug in nxhtml?

Neither.  Emacs is (arguably) reading it incorrectly.

 > describe-char on the first symbol gives (I replaced the BOM part with
 > XXX):

The signature is *not* part of the text according to the Unicode
standard, and if recognized as a signature should be removed by the
I/O system (here, Emacs) before passing it to the XML processor.

 > |         file code: #xEF #xBB #xBF (encoded by coding system utf-8-unix)

There should be an Emacs coding system that removes the BOM.  The XML
standard requires that the XML declaration, if present, be the first
thing in the file.  XML does not recognize the BOM as part of the
prolog, optional or otherwise.  The BOM signals the encoding of the
document, but in XML the atomic constituents are characters; there is
no encoding, and thus no place for a BOM.  (The standard recognizes
that encoding varies from context to context, and provides means for
specifying it, but that's a different issue.)

See Mark Hershberger's reply for more detail on the syntax of an XML
file.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]