[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xml-escape-region

From: Daniel Colascione
Subject: Re: [PATCH] xml-escape-region
Date: Thu, 8 Oct 2009 02:01:05 -0400

Hash: SHA1

On Oct 8, 2009, at 1:29 AM, Stefan Monnier wrote:

+(defun xml-escape-region (beg end)
+  (interactive "*r")
+  (let ((escaped (xml-escape-string (buffer-substring beg end))))
+    (delete-region beg end)
+    (insert escaped)))

I'd rather not autoload such a function.

Do you mean that it should be loaded all the time, or that the user should
have to explicitly load xml.el before using the function?


If the latter, then that would make binding it to a key
less convenient.

Hmm... didn't notice you defined it as a command. How often/when do you
need to use/bind such a command other than in an sgml/xml-related file
(where the major mode might decide to preload such a command)?

Pretty often, actually. XML (or XML-like syntax) crops up in a lot of places, including literal strings in many programming languages. Some basic XML-editing functionality being available everywhere would be useful.

 (let ((search-re (mapconcat #'regexp-quote
                             (mapcar #'cdr xml-entity-alist)

Rather than a big \| of single chars, why not make a [...] regexp?
If you use regexp-opt, it should happen automatically.

I figured the constant-factor overhead of regexp-opt (and its autoloading) wasn't worth it for such a simple regexp.

Actually, now that I look at it, xml-entity-alist is poorly defined.
Instead of being a list of pairs of string and string (where the second
string is always of size 1), it should be a list of pairs of string
and char.

I think the idea was to be able to replace multi-character strings with XML entities defined for the current document.

Also this code is also applicable to sgml and there's related
code in sgml-mode.el.  If someone wants to consolidate, that would
be welcome.

Does anyone actually use the unquotep parameter? It seems like quoting and unquoting should be separate functions. Nevertheless, the patch below should preserve existing behavior. I've also renamed the XML functions to better match existing code, e.g., base64.

     (goto-char beg)
     (while (re-search-forward search-re end t)
       (replace-match (concat "&"
                              (car (rassoc (match-string 0)

If you use a backward-search, you don't need to turn `end' (nor `start')
into a marker.

Good idea.

Index: xml.el
RCS file: /sources/emacs/emacs/lisp/xml.el,v
retrieving revision 1.64
diff -u -r1.64 xml.el
- --- xml.el    5 Jan 2009 03:19:57 -0000       1.64
+++ xml.el      8 Oct 2009 05:58:20 -0000
@@ -840,6 +840,40 @@

 (defalias 'xml-print 'xml-debug-print)

+(defun xml-encode-region (start end)
+ "XML-escape text between START and END according to `xml-entity- alist`."
+  (interactive "*r")
+  (let ((search-re (mapconcat #'regexp-quote
+                              (mapcar #'cdr xml-entity-alist)
+                              "\\|")))
+    (save-excursion
+      (goto-char end)
+      (while (re-search-backward search-re start t)
+        (replace-match (concat "&"
+                               (car (rassoc (match-string 0)
+                                            xml-entity-alist))
+                               ";"))
+        (goto-char (match-beginning 0))))))
+(defun xml-decode-region (start end)
+ "Decode XML entities between START and END according to `xml-entity- alist`."
+  (interactive "*r")
+  (let ((search-re (concat "&\\("
+                           (mapconcat #'regexp-quote
+                                      (mapcar #'car xml-entity-alist)
+                                      "\\|")
+                           "\\);")))
+    (save-excursion
+      (goto-char end)
+      (while (re-search-backward search-re start t)
+        (replace-match (cdr (assoc (match-string 1) xml-entity-alist)))
+        (goto-char (match-beginning 0))))))
 (defun xml-escape-string (string)
   "Return the string with entity substitutions made from
Index: textmodes/sgml-mode.el
RCS file: /sources/emacs/emacs/lisp/textmodes/sgml-mode.el,v
retrieving revision 1.141
diff -u -r1.141 sgml-mode.el
- --- textmodes/sgml-mode.el    24 Sep 2009 23:22:20 -0000      1.141
+++ textmodes/sgml-mode.el      8 Oct 2009 05:58:21 -0000
@@ -1097,21 +1097,10 @@
 Only &, < and > are quoted, the rest is left untouched.
 With prefix argument UNQUOTEP, unquote the region."
   (interactive "r\nP")
- -  (save-restriction
- -    (narrow-to-region start end)
- -    (goto-char (point-min))
- -    (if unquotep
- -     ;; FIXME: We should unquote other named character references as well.
- -     (while (re-search-forward
- -             "\\(&\\(amp\\|\\(l\\|\\(g\\)\\)t\\)\\)[][<>&;\n\t \"%!'(),/=?]"
- -             nil t)
- -       (replace-match (if (match-end 4) ">" (if (match-end 3) "<" "&")) t t
- -                      nil (if (eq (char-before (match-end 0)) ?\;) 0 1)))
- -      (while (re-search-forward "[&<>]" nil t)
- -     (replace-match (cdr (assq (char-before) '((?& . "&amp;")
- -                                               (?< . "&lt;")
- -                                               (?> . "&gt;"))))
- -                    t t)))))
+  (if unquote
+      (xml-decode-region start end)
+    (xml-encode-region start end)))

 (defun sgml-pretty-print (beg end)
   "Simple-minded pretty printer for SGML.

Version: GnuPG v1.4.9 (Darwin)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]