bug#27178: 26.0.50; libxml-parse-*-region functions ignore discard-comme

From: npostavs
Subject: bug#27178: 26.0.50; libxml-parse-*-region functions ignore discard-comments argument
Date: Sat, 10 Jun 2017 11:50:06 -0400
retitle 27178 libxml-parse-*-region functions discard-comments argument only 
applies to top level comments
found 27178 25.2
Sean McAfee <address@hidden> writes:

> The libxml-parse-html-region and libxml-parse-xml-region functions both
> appear to ignore their discard-comments parameters.
> When I enter the following text in a buffer and mark it:
>   <p>This <!-- and --> that</p>
> Then the result of evaluating the expression
>   (libxml-parse-html-region (mark) (point) nil t)
> is
>   (html nil (body nil (p nil "This " (comment nil " and ") " that")))
> and the result of evaluating the expression
>   (libxml-parse-xml-region (mark) (point) nil t)
> is
>   (p nil "This " (comment nil " and ") " that")
> In both cases, I would expect that passing t as the fourth argument
> would cause the comments to be dropped, but they are not.

It doesn't quite ignore that argument, but it only applies to top level
comments.  I think it's the implementation level leaking through.  See
in xml.c:

    static Lisp_Object
    parse_region (Lisp_Object start, Lisp_Object end, Lisp_Object base_url,
              Lisp_Object discard_comments, bool htmlp)
        /* The document doesn't have toplevel comments or we discarded
           them.  Get the tree the proper way. */
        xmlNode *node = xmlDocGetRootElement (doc);

Apparently the "proper" way already discards top level comments, so the
DISCARD-COMMENTS parameter was added to be able to control this.  Maybe
we should just update the docs to match the code though, not sure.

> Incidentally, I notice that the documentation for
> libxml-parse-xml-region includes the following sentence:
>   If DISCARD-COMMENTS is non-nil, all HTML comments are discarded.
> I imagine this ought to refer to "XML comments" rather than "HTML
> comments."

Yeah, looks like copy-pasta from libxml-parse-html-region.

