[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 23.0.60; Defaut encoding for XML files should be undefined (instead

From: Stephen J. Turnbull
Subject: Re: 23.0.60; Defaut encoding for XML files should be undefined (instead of utf-8)
Date: Wed, 20 Feb 2008 07:02:21 +0900

Stefan Monnier writes:

 > My understanding of the OP's situation is that his files are not XML
 > files, but plaintext files that happen to contain XML fragments.

Interpreting the XML 1.0 standard, if those XML fragments are intended
to be parsed by the XML processor as part of the document, they are
(conceptually) "external entities".  How that affects XML processing
will depend on exactly what you mean by "text-concatenation".

ISTM there are two possibilities.  First, use the XML facilities (ie,
an entity reference).  That looks like this (there's also a "PUBLIC"
entity version):

<!ENTITY open-hatch
         SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml";>

Blah blah blah
foo bar baz.

Entity reference has the advantage of using XML catalogs and the like
to find the entity (similar to the way C's #include allows cpp to use
an include path).  The XML specification requires entities to declare
their own encoding using a text declaration, unless it is UTF-8 or can
be detected using the Byte Order Mark.  IMO this is the obvious way to
do things if your XML processor supports external entity reference.

Second, use some kind of preprocessor for concatenation, such as cat
or cpp.  In this case, a text declaration can't be used because it
must appear as the first thing in the entity, but the XML process will
see only a single entity, the whole document.  In that case the XML
specification says nothing about the fragments.

However, because the XML specification mandates a fatal error[1] when
a processor detects any encoding inconsistency or ambiguity, to users
the risks of guessing about fragment encodings are potentially high
(at least in annoyance).  So I advocate using a multientity framework
(for this purpose among others) where some sort of master document is
available to check consistency, rather than Mule guesswork on a
file-by-file basis.

 > I don't know much about XML:

The XML specification is rather short (especially compared to the
SGML specification), yet self-contained.

[1]  Not necessarily termination of the process, but normal processing
must terminate, and the XML processor permanently enters an error mode.
Very annoying at best.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]