help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: regexp and strings you don't want


From: Oliver Scholz
Subject: Re: regexp and strings you don't want
Date: Fri, 29 Aug 2003 20:50:18 +0200
User-agent: Gnus/5.1002 (Gnus v5.10.2) Emacs/21.3.50 (windows-nt)

[Yet another follow-up to myself ...]
[Superseded because of a flaky patch]

Oliver Scholz <address@hidden> writes:

> address@hidden (Kai Großjohann) writes:
>
>> address@hidden (Chaz) writes:
>>
>>> For example, how can I search for a paragraph beginning with "The"
>>> that does NOT include the word "top"?
>>
>> It is possible to build a regexp that does this (disregarding the
>> paragraph problem at the moment), but it is not pretty.
>>
>> Some regexp implementations have the feature you're looking for to
>> make it convenient, but the Emacs implementation doesn't.
>>
>> Let me rephrase this in terms of lines instead of paragraphs.
>>
>> The idea is this: search for a line that begins with The and then
>> does not have top after it, as follows: after The, we allow any
>> characters that aren't t.  We also allow a t followed by something
>> that's not o, and also a to that's followed by something that's not
>> p.  And so on:
>>
>> "^The\\([^t]*\\($\\|t$\\|t[^o]\\|to$\\|to[^p]\\)\\)*$"
>
> Hmm. This is not really human readable. Would it be hard and/or bad
> to extend `rx' so that it allows for (not STRING)? A là:
>
> (looking-at (rx (and line-start
>                    "The "
>                    (not "top"))))
>
> Whereas `(not "top")' would compile to a normal regexp in the way you
> described it. WDYT?
[...]

I've played a bit with this (patch below). But I thing I am a bit
puzzled. With my patch, `(rx (not top))' translates to:

"\\(?:[^t]*\\|t[^o]*\\|to[^p]*\\)"

Is this actually correct?

What does the concept of a regexp that matches a sequence of
characters that does _not_ contain a certain sequence of characters
actually mean?

Should it match any sequence of characters not identical to the
unwanted one (including the empty string) or should it match only
sequences of the same length? Or any non-empty sequence of characters
not identical with the unwanted one?

With my patch:

(string-match (rx (and line-start
                       "The "
                       (not "top")
                       " lirum larum"))
              "The top lirum larum")
 ==> nil

(string-match (rx (and line-start
                       "The "
                       (not "top")
                       " lirum larum"))
              "The to lirum larum")
 ==> 0

(string-match (rx (and line-start
                       "The "
                       (not "top")
                       " lirum larum"))
              "The lirum larum")

 ==> nil

Is this good or bad?

    Oliver (puzzled)


Index: lisp/emacs-lisp/rx.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/emacs-lisp/rx.el,v
retrieving revision 1.3
diff -u -r1.3 rx.el
--- lisp/emacs-lisp/rx.el       23 Dec 2002 17:43:24 -0000      1.3
+++ lisp/emacs-lisp/rx.el       29 Aug 2003 18:46:18 -0000
@@ -334,6 +334,7 @@
                    '(digit control hex-digit blank graphic printing
                            alphanumeric letter ascii nonascii lower
                            punctuation space upper word))
+             (stringp form)
              (and (consp form)
                   (memq (car form) '(not any in syntax category:))))
     (error "Rx `not' syntax error: %s" form))
@@ -343,27 +344,41 @@
 (defun rx-not (form)
   "Parse and produce code from FORM.  FORM is `(not ...)'."
   (rx-check form)
-  (let ((result (rx-to-string (cadr form) 'no-group)))
-    (cond ((string-match "\\`\\[^" result)
-          (if (= (length result) 4)
-              (substring result 2 3)
-            (concat "[" (substring result 2))))
-         ((string-match "\\`\\[" result)
-          (concat "[^" (substring result 1)))
-         ((string-match "\\`\\\\s." result)
-          (concat "\\S" (substring result 2)))
-         ((string-match "\\`\\\\S." result)
-          (concat "\\s" (substring result 2)))
-         ((string-match "\\`\\\\c." result)
-          (concat "\\C" (substring result 2)))
-         ((string-match "\\`\\\\C." result)
-          (concat "\\c" (substring result 2)))
-         ((string-match "\\`\\\\B" result)
-          (concat "\\b" (substring result 2)))
-         ((string-match "\\`\\\\b" result)
-          (concat "\\B" (substring result 2)))
-         (t
-          (concat "[^" result "]")))))
+  (if (stringp (cadr form))
+      (rx-reverse-string (cadr form))
+    (let ((result (rx-to-string (cadr form) 'no-group)))
+      (cond ((string-match "\\`\\[^" result)
+            (if (= (length result) 4)
+                (substring result 2 3)
+              (concat "[" (substring result 2))))
+           ((string-match "\\`\\[" result)
+            (concat "[^" (substring result 1)))
+           ((string-match "\\`\\\\s." result)
+            (concat "\\S" (substring result 2)))
+           ((string-match "\\`\\\\S." result)
+            (concat "\\s" (substring result 2)))
+           ((string-match "\\`\\\\c." result)
+            (concat "\\C" (substring result 2)))
+           ((string-match "\\`\\\\C." result)
+            (concat "\\c" (substring result 2)))
+           ((string-match "\\`\\\\B" result)
+            (concat "\\b" (substring result 2)))
+           ((string-match "\\`\\\\b" result)
+            (concat "\\B" (substring result 2)))
+           (t
+            (concat "[^" result "]"))))))
+
+(defun rx-reverse-string (string)
+  (let ((list nil))
+    (dotimes (i (length string))
+      (push (rx-reverse-string-1 i string) list))
+    (concat "\\(?:"
+           (mapconcat 'identity (nreverse list) "\\|")
+           "\\)")))
+
+(defun rx-reverse-string-1 (n string)
+  (concat (substring string 0 n)
+         "[^" (string (aref string n)) "]*"))
 
 
 (defun rx-repeat (form)

-- 
12 Fructidor an 211 de la Révolution
Liberté, Egalité, Fraternité!


reply via email to

[Prev in Thread] Current Thread [Next in Thread]