bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

nested comments in sgml-mode are not properly quoted.


From: Martin Schwamberger
Subject: nested comments in sgml-mode are not properly quoted.
Date: Wed, 29 Jan 2003 23:19:38 +0100

Hi,

I frequently use comment-region and I was really unhappy 
when I found, that I couldn't use it savely in sgml/xml mode,
due to an already known quoting problem.

Since I couldn't find any way to avoid the problem without changing the code,
I decided to fix the bug in newcomment.el which was shipped with emacs 21.2.1.

The original quoting algorithm inserts one or more backslashes
between first and second character of the comment markers.
This leads to <\!-- .....  -\-> for SGML/XML comments.
Unfortunatly, the resulting -- sequence is not allowed within SGML comments 
(see http://www.w3.org/TR/REC-xml#sec-comments)

My algorithm inserts backslashes after every character
except the last if the marker is longer than one character.
This leads to <\!\-\- .....  -\-\>, which is allowed within comments.

I've tested it for SGML and C style comments.
I've also played with pascal comments in order to see
what happens with single char endcomment markers.
Everything seems to work well.

Since it does only require the backslash(es) after the first character
when it unquotes, it is able to unquote comment markers
quoted by prior versions.

Here are my new versions of comment-quote-re and comment-quote-nested.
I left the original lines as comments.
Immediately after these comments, my code starts with
;; MS:
and ends with
;; --------------------------------------------------------------------


(defun comment-quote-re (str unp)
;; --------------------------------------------------------------------
;;   (concat (regexp-quote (substring str 0 1))
;;        "\\\\" (if unp "+" "*")
;;        (regexp-quote (substring str 1))))
;; --------------------------------------------------------------------
;; MS:
  (let ((i 1)
        (len (length str))
        ;; Each backslash sequence is defined as subexpression
        ;; in order add or remove backslashes easily (see comment-quote-nested).
        (qre (concat (regexp-quote (substring str 0 1)) "\\(\\\\" (if unp "+" 
"*") "\\)")))
    (while (< i len)
      (setq qre
        (concat qre
          (regexp-quote (substring str i (1+ i)))
          ;; No trailing backslash for strings longer than one char.
          ;; Even though UNP is true, Backslash is optional to remain 
compatible.
          (if (< (1+ i) len) "\\(\\\\*\\)")))
      (setq i (1+ i)))
    qre))
;; --------------------------------------------------------------------

(defun comment-quote-nested (cs ce unp)
  "Quote or unquote nested comments.
If UNP is non-nil, unquote nested comment markers."
  (setq cs (comment-string-strip cs t t))
  (setq ce (comment-string-strip ce t t))
  (when (and comment-quote-nested (> (length ce) 0))
    (let ((re (concat (comment-quote-re ce unp)
                "\\|" (comment-quote-re cs unp))))
      (goto-char (point-min))
      (while (re-search-forward re nil t)
;; --------------------------------------------------------------------
;;      (goto-char (match-beginning 0))
;;      (forward-char 1)
;;      (if unp (delete-char 1) (insert "\\"))
;; --------------------------------------------------------------------
;; MS:
        (let ((i (regexp-opt-depth re)))
          ;; For each subexpression (sequence of backslashes) 
          (while (> i 0)
            (when (match-beginning i)
              (goto-char (match-beginning i))
              (if unp
                ;; quoted?
                (if (> (match-end i) (match-beginning i))
                  (delete-char 1))
                (insert "\\")))
            (setq i (1- i))))
;; --------------------------------------------------------------------
        (when (= (length ce) 1)
          ;; If the comment-end is a single char, adding a \ after that
          ;; "first" char won't deactivate it, so we turn such a CE
          ;; into !CS.  I.e. for pascal, we turn } into !{
          (if (not unp)
              (when (string= (match-string 0) ce)
                (replace-match (concat "!" cs) t t))
            (when (and (< (point-min) (match-beginning 0))
                       (string= (buffer-substring (1- (match-beginning 0))
                                                  (1- (match-end 0)))
                                (concat "!" cs)))
;; --------------------------------------------------------------------
;;            (backward-char 2)
;; --------------------------------------------------------------------
;; MS:
              (goto-char (1- (match-beginning 0)))
;; --------------------------------------------------------------------
              (delete-char (- (match-end 0) (match-beginning 0)))
              (insert ce))))))))


I hope, this gives you at least a few useful ideas,

Martin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]