bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#48211: 28.0.50; eww strips whitespace between <mark> elements


From: Stefan Kangas
Subject: bug#48211: 28.0.50; eww strips whitespace between <mark> elements
Date: Mon, 3 May 2021 19:35:35 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

"Basil L. Contovounesios" <contovob@tcd.ie> writes:

> I think this is because libxml-parse-html-region specifies
> HTML_PARSE_NOBLANKS:
>
> Return CDATA sections (like <style>foo</style>) as text nodes.
> 3c2317e891 2010-12-06 17:59:52 +0100
> https://git.sv.gnu.org/cgit/emacs.git/commit/?id=3c2317e89100833812a7194c0d9d39ae0f52cb33

Hmm, okay.  For now, I'm seeing this issue with basically any tag that
libxml2 does not already know about, e.g. "<summary>" or "<bdi>".

This is what I came up with before reading Basil's reply:

(with-temp-buffer
  (insert "<p><tt>foo</tt> <tt>bar</tt></p>")
  (libxml-parse-html-region (point-min) (point-max)))

=> (html nil (body nil (p nil (tt nil "foo") " " (tt nil "bar"))))

(with-temp-buffer
  (insert "<p><mark>foo</mark> <mark>bar</mark></p>")
  (libxml-parse-html-region (point-min) (point-max)))

=> (html nil (body nil (p nil (mark nil "foo") (mark nil "bar"))))

I guess this is a bug in libxml2, so I reported it here:

    https://gitlab.gnome.org/GNOME/libxml2/-/issues/247

FWIW, the below diff works around this bug for me.

diff --git a/lisp/net/shr.el b/lisp/net/shr.el
index cbdeb65ba8..3eb3a5bc49 100644
--- a/lisp/net/shr.el
+++ b/lisp/net/shr.el
@@ -1485,6 +1485,12 @@ shr-tag-tt
   ;; The `tt' tag is deprecated in favor of `code'.
   (shr-tag-code dom))

+(defun shr-tag-mark (dom)
+  (shr-generic dom)
+  ;; Hack to work around bug in libxml2 (Bug#48211):
+  ;; https://gitlab.gnome.org/GNOME/libxml2/-/issues/247
+  (insert " "))
+
 (defun shr-tag-ins (cont)
   (let* ((start (point))
          (color "green")





reply via email to

[Prev in Thread] Current Thread [Next in Thread]