[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 23.0.60; [nxml] BOM and utf-8

From: Stephen J. Turnbull
Subject: Re: 23.0.60; [nxml] BOM and utf-8
Date: Sat, 24 May 2008 06:23:18 +0900

address@hidden writes:

 > As for whether Emacs or nxml has the burden of skipping the BOM -- that
 > would correspond to whether nxml "within" Emacs is "seeing" a piece of
 > XML or a whole XML document, right?

No, I don't think so.  First, as I tried to explain, I don't think
that Emacs can reliably "know" that the BOM needs to be skipped at
decoding time.  Second, if the "piece" is what XML calls a "parsed
external entity" (analogous to an include file), it must be subjected
to BOM processing according to section 4.3.3 of the XML standard.  On
the other hand, if the fragment is generated internally to Emacs, then
there should be no BOM, because the BOM is not part of the text of an
XML document: "This is an encoding signature, not part of either the
markup or the character data of the XML document."  While on the other
hand the BOM will not be produced with character semantics (as ZWNBSP)
in modern (since Unicode 3.2) Unicode processes.

So I think there is almost never going to be harm in nxml stripping
the BOM, whereas Emacs has to be much more careful.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]