[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 23.0.60; [nxml] BOM and utf-8
From: |
Stephen J. Turnbull |
Subject: |
Re: 23.0.60; [nxml] BOM and utf-8 |
Date: |
Sat, 24 May 2008 06:23:18 +0900 |
address@hidden writes:
> As for whether Emacs or nxml has the burden of skipping the BOM -- that
> would correspond to whether nxml "within" Emacs is "seeing" a piece of
> XML or a whole XML document, right?
No, I don't think so. First, as I tried to explain, I don't think
that Emacs can reliably "know" that the BOM needs to be skipped at
decoding time. Second, if the "piece" is what XML calls a "parsed
external entity" (analogous to an include file), it must be subjected
to BOM processing according to section 4.3.3 of the XML standard. On
the other hand, if the fragment is generated internally to Emacs, then
there should be no BOM, because the BOM is not part of the text of an
XML document: "This is an encoding signature, not part of either the
markup or the character data of the XML document." While on the other
hand the BOM will not be produced with character semantics (as ZWNBSP)
in modern (since Unicode 3.2) Unicode processes.
So I think there is almost never going to be harm in nxml stripping
the BOM, whereas Emacs has to be much more careful.
- Re: 23.0.60; [nxml] BOM and utf-8, (continued)
- Re: 23.0.60; [nxml] BOM and utf-8, tomas, 2008/05/22
- Re: 23.0.60; [nxml] BOM and utf-8, Miles Bader, 2008/05/22
- Re: 23.0.60; [nxml] BOM and utf-8, Jason Rumney, 2008/05/22
- Re: 23.0.60; [nxml] BOM and utf-8, tomas, 2008/05/27
- Re: 23.0.60; [nxml] BOM and utf-8, Stephen J. Turnbull, 2008/05/22
- Re: 23.0.60; [nxml] BOM and utf-8, tomas, 2008/05/23
- Re: 23.0.60; [nxml] BOM and utf-8,
Stephen J. Turnbull <=
- Re: 23.0.60; [nxml] BOM and utf-8, tomas, 2008/05/27
23.0.60; [nxml] BOM and utf-8, Stephen J. Turnbull, 2008/05/17
- Re: 23.0.60; [nxml] BOM and utf-8, Miles Bader, 2008/05/17
- Re: 23.0.60; [nxml] BOM and utf-8, Eli Zaretskii, 2008/05/17
- Re: 23.0.60; [nxml] BOM and utf-8, Stephen J. Turnbull, 2008/05/18
- Re: 23.0.60; [nxml] BOM and utf-8, Jason Rumney, 2008/05/18
- Re: 23.0.60; [nxml] BOM and utf-8, Patrick Drechsler, 2008/05/18
- Re: 23.0.60; [nxml] BOM and utf-8, Stephen J. Turnbull, 2008/05/18
- Re: 23.0.60; [nxml] BOM and utf-8, Patrick Drechsler, 2008/05/19
- Re: 23.0.60; [nxml] BOM and utf-8, Eli Zaretskii, 2008/05/19