[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Drop toplevel XML-comments in libxml-parse-(xml|html)-region?

From: Lars Magne Ingebrigtsen
Subject: Re: Drop toplevel XML-comments in libxml-parse-(xml|html)-region?
Date: Tue, 11 Nov 2014 17:28:27 +0100
User-agent: Gnus/5.130012 (Ma Gnus v0.12) Emacs/25.0.50 (gnu/linux)

Ulf Jasper <address@hidden> writes:

> parse_region from xml.c, which is called by `libxml-parse-xml-region'
> and `libxml-parse-html-region', makes some effort to retain top-level
> comments in xml documents.  If necessary it adds an artificial node at
> the top of the parse tree.  As a consequence one has to check whether
> the result contains the "top" node or not (see below for an example).
> This behaviour is different from that of `xml-parse-region' (from
> xml.el), which just discards the toplevel comments.
> Can we make `libxml-parse-(xml|html)-region' consistent with
> `xml-parse-region', i.e. can we drop the toplevel xml comments (and
> simply call xmlDocGetRootElement)?

I have no opinion in this, but this was added to the libxml code to make
it possible to re-generate XML documents as is, which is not possible
with the way `xml-parse-region' discards top-level comments.

So I don't know what the right fix here is.  On the one hand, it is
(perhaps) surprising that comments are preserved (at all, anywhere) in
the structure returned by the parser.  However, stashing data that is to
be further parsed by the HTML engine is a common feature that must be

If we preserve comments further down in the DOM, then not preserving
them at the top level seems inconsistent.

But perhaps that inconsistency is fine?

(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

reply via email to

[Prev in Thread] Current Thread [Next in Thread]