bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

fixes to rx.el


From: Matthew Swift
Subject: fixes to rx.el
Date: Sun, 20 Oct 2002 14:33:17 -0400

This bug report will be sent to the Free Software Foundation,
not to your local site managers!
Please write in English, because the Emacs maintainers do not have
translators to read other languages for them.

Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing list,
and to the gnu.emacs.bug news group.

In GNU Emacs 21.2.1 (i386-debian-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2002-03-22 on raven, modified by Debian
configured using `configure  i386-debian-linux-gnu --prefix=/usr 
--sharedstatedir=/var/lib --libexecdir=/usr/lib --localstatedir=/var/lib 
--infodir=/usr/share/info --mandir=/usr/share/man --with-pop=yes --with-x=yes 
--with-x-toolkit=athena --without-gif'
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_US
  locale-coding-system: iso-latin-1
  default-enable-multibyte-characters: t

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:

Following are several fixes to the code and documentation of rx.el,
first in the form of a corrective `eval-after-load', including brief
comments on what was changed, then in the form of a diff (without
those comments).

The problem with the original code is that it includes more shy groups
than are necessary.  The additional shy groups also do not match the
examples in the documentation in the code comments, so it seems to me
that my revisions restore behavior that the author first intended, but
which was unintentionally altered by a subsequent revision. The
examples also had a couple of mistakes that seemed like typos, which I
have corrected.

I have verified that my revisions correctly process the few examples
in the documentation, and I am confident on theoretic grounds that
they are sound revisions, but I have not done extensive testing.

(eval-after-load "rx"
  '(progn
      ;; change these two functions to call `rx-to-string' with non-nil second
      ;; argument
    (defun rx-submatch (form)
      "Parse and produce code from FORM, which is `(submatch ...)'."
        (concat "\\("
                (mapconcat (function (lambda (x) (rx-to-string x 'no-group)))
                           (cdr form) nil)
                "\\)"))
    ;; also change this one to enclose the result in a shy group, which
    ;; documentation implies should happen
    (defun rx-and (form)
      "Parse and produce code from FORM.
    FORM is of the form `(and FORM1 ...)'."
      (rx-check form)
      (concat "\\(?:"
              (mapconcat
               (function (lambda (x) (rx-to-string x 'no-group)))
               (cdr form) nil)
              "\\)"))

    ;; change this one to check with new function `rx-atomic-p'
    (defun rx-kleene (form)
      "Parse and produce code from FORM.
FORM is `(OP FORM1)', where OP is one of the `zero-or-one',
`zero-or-more' etc.  operators.
If OP is one of `*', `+', `?', produce a greedy regexp.
If OP is one of `*?', `+?', `??', produce a non-greedy regexp.
If OP is anything else, produce a greedy regexp if `rx-greedy-flag'
is non-nil."
      (rx-check form)
      (let ((suffix (cond ((memq (car form) '(* + ? )) "")
                          ((memq (car form) '(*? +? ??)) "?")
                          (rx-greedy-flag "")
                          (t "?")))
            (op (cond ((memq (car form) '(* *? 0+ zero-or-more)) "*")
                      ((memq (car form) '(+ +? 1+ one-or-more))  "+")
                      (t "?")))
            (result (rx-to-string (cadr form) 'no-group)))
        (if (not (rx-atomic-p result))
            (setq result (concat "\\(?:" result "\\)")))
        (concat result op suffix)))

    ;; new function
    (defun rx-atomic-p (r)
      "Return non-nil if regexp string R is atomic.
An atomic regexp R is one such that a suffix operator
appended to R will apply to all of R.  For example, \"a\"
\"[abc]\" and \"\\(ab\\|ab*c\\)\" are atomic and \"ab\",
\"[ab]c\", and \"ab\\|ab*c\" are not atomic.

This function may return false negatives, but it will not
return false positives.  It is nevertheless useful in
situations where an efficiency shortcut can be taken iff a
regexp is atomic.  The function can be improved to detect
more cases of atomic regexps.  Presently, this function
detects the following categories of atomic regexp;

  a group or shy group:  \\(...\\)
  a character class:     [...]
  a single character:    a

On the other hand, false negatives will be returned for
regexps that are atomic but end in operators, such as
\"a+\".  I think these are rare.  Probably such cases could
be detected without much effort.  A guarantee of no false
negatives would require a theoretic specification of the set
of all atomic regexps."
      (let ((l (length r)))
        (or (equal l 1)
            (and (>= l 6)
                 (equal (substring r 0 2) "\\(")
                 (equal (substring r -2) "\\)"))
            (and (>= l 2)
                 (equal (substring r 0 1) "[")
                 (equal (substring r -1) "]")))))))



cd /tmp/
diff -c /tmp/rx.el.orig /tmp/rx.el
*** /tmp/rx.el.orig     2002-10-20 14:04:09.000000000 -0400
--- /tmp/rx.el  2002-10-20 14:26:48.000000000 -0400
***************
*** 61,69 ****
  ;; "^content-transfer-encoding:\\(\n?[\t ]\\)*quoted-printable\\(\n?[\t ]\\)*"
  ;; (rx (and line-start
  ;;          "content-transfer-encoding:"
! ;;          (+ (? ?\n) blank)
  ;;        "quoted-printable"
! ;;        (+ (? ?\n) blank))
  ;;
  ;; (concat "^\\(?:" something-else "\\)")
  ;; (rx (and line-start (eval something-else))), statically or
--- 61,69 ----
  ;; "^content-transfer-encoding:\\(\n?[\t ]\\)*quoted-printable\\(\n?[\t ]\\)*"
  ;; (rx (and line-start
  ;;          "content-transfer-encoding:"
! ;;          (+ (? ?\n)) blank
  ;;        "quoted-printable"
! ;;        (+ (? ?\n)) blank))
  ;;
  ;; (concat "^\\(?:" something-else "\\)")
  ;; (rx (and line-start (eval something-else))), statically or
***************
*** 78,88 ****
  ;;         (and line-start ?\n)))
  ;;
  ;; "\\$[I]d: [^ ]+ \\([^ ]+\\) "
! ;; (rx (and "$Id": " 
  ;;          (1+ (not (in " "))) 
  ;;          " "
  ;;          (submatch (1+ (not (in " "))))
! ;;          " ")))
  ;;
  ;; "\\\\\\\\\\[\\w+"
  ;; (rx (and ?\\ ?\\ ?\[ (1+ word)))
--- 78,88 ----
  ;;         (and line-start ?\n)))
  ;;
  ;; "\\$[I]d: [^ ]+ \\([^ ]+\\) "
! ;; (rx (and "$Id: " 
  ;;          (1+ (not (in " "))) 
  ;;          " "
  ;;          (submatch (1+ (not (in " "))))
! ;;          " "))
  ;;
  ;; "\\\\\\\\\\[\\w+"
  ;; (rx (and ?\\ ?\\ ?\[ (1+ word)))
***************
*** 269,278 ****
  
  
  (defun rx-and (form)
!   "Parse and produce code from FORM.
! FORM is of the form `(and FORM1 ...)'."
!   (rx-check form)
!   (mapconcat #'rx-to-string (cdr form) nil))
  
  
  (defun rx-or (form)
--- 269,282 ----
  
  
  (defun rx-and (form)
!       "Parse and produce code from FORM.
!     FORM is of the form `(and FORM1 ...)'."
!       (rx-check form)
!       (concat "\\(?:"
!             (mapconcat
!              (function (lambda (x) (rx-to-string x 'no-group)))
!              (cdr form) nil)
!             "\\)"))
  
  
  (defun rx-or (form)
***************
*** 383,391 ****
  
  
  (defun rx-submatch (form)
!   "Parse and produce code from FORM, which is `(submatch ...)'."
!   (concat "\\(" (mapconcat #'rx-to-string (cdr form) nil) "\\)"))
! 
  
  (defun rx-kleene (form)
    "Parse and produce code from FORM.
--- 387,397 ----
  
  
  (defun rx-submatch (form)
!       "Parse and produce code from FORM, which is `(submatch ...)'."
!       (concat "\\("
!               (mapconcat (function (lambda (x) (rx-to-string x 'no-group)))
!                          (cdr form) nil)
!               "\\)"))
  
  (defun rx-kleene (form)
    "Parse and produce code from FORM.
***************
*** 395,410 ****
  If OP is one of `*?', `+?', `??', produce a non-greedy regexp.
  If OP is anything else, produce a greedy regexp if `rx-greedy-flag'
  is non-nil."
!   (rx-check form)
!   (let ((suffix (cond ((memq (car form) '(* + ? )) "")
!                     ((memq (car form) '(*? +? ??)) "?")
!                     (rx-greedy-flag "")
                      (t "?")))
!       (op (cond ((memq (car form) '(* *? 0+ zero-or-more)) "*")
!                 ((memq (car form) '(+ +? 1+ one-or-more))  "+")
!                 (t "?"))))
!     (format "\\(?:%s\\)%s%s" (rx-to-string (cadr form) 'no-group) 
!           op suffix)))
  
  
  (defun rx-syntax (form)
--- 401,451 ----
  If OP is one of `*?', `+?', `??', produce a non-greedy regexp.
  If OP is anything else, produce a greedy regexp if `rx-greedy-flag'
  is non-nil."
!       (rx-check form)
!       (let ((suffix (cond ((memq (car form) '(* + ? )) "")
!                         ((memq (car form) '(*? +? ??)) "?")
!                         (rx-greedy-flag "")
!                         (t "?")))
!           (op (cond ((memq (car form) '(* *? 0+ zero-or-more)) "*")
!                     ((memq (car form) '(+ +? 1+ one-or-more))  "+")
                      (t "?")))
!           (result (rx-to-string (cadr form) 'no-group)))
!       (if (not (rx-atomic-p result))
!           (setq result (concat "\\(?:" result "\\)")))
!       (concat result op suffix)))
! 
! (defun rx-atomic-p (r)
!       "Return non-nil if regexp string R is atomic.
! An atomic regexp R is one such that a suffix operator
! appended to R will apply to all of R.  For example, \"a\"
! \"[abc]\" and \"\\(ab\\|ab*c\\)\" are atomic and \"ab\",
! \"[ab]c\", and \"ab\\|ab*c\" are not atomic.
! 
! This function may return false negatives, but it will not
! return false positives.  It is nevertheless useful in
! situations where an efficiency shortcut can be taken iff a
! regexp is atomic.  The function can be improved to detect
! more cases of atomic regexps.  Presently, this function
! detects the following categories of atomic regexp;
! 
!   a group or shy group:  \\(...\\)
!   a character class:     [...]
!   a single character:    a
! 
! On the other hand, false negatives will be returned for
! regexps that are atomic but end in operators, such as
! \"a+\".  I think these are rare.  Probably such cases could
! be detected without much effort.  A guarantee of no false
! negatives would require a theoretic specification of the set
! of all atomic regexps."
!       (let ((l (length r)))
!       (or (equal l 1)
!           (and (>= l 6)
!                (equal (substring r 0 2) "\\(")
!                (equal (substring r -2) "\\)"))
!           (and (>= l 2)
!                (equal (substring r 0 1) "[")
!                (equal (substring r -1) "]")))))
  
  
  (defun rx-syntax (form)

Diff finished at Sun Oct 20 14:27:42




reply via email to

[Prev in Thread] Current Thread [Next in Thread]