help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to grok a complicated regex?


From: Alan Mackenzie
Subject: Re: How to grok a complicated regex?
Date: Wed, 18 Mar 2015 16:40:35 +0000 (UTC)
User-agent: tin/2.2.0-20131224 ("Lochindaal") (UNIX) (FreeBSD/10.1-RELEASE (amd64))

Hi, Marcin.

Sorry if I'm a bit late to this discussion.

Marcin Borkowski <address@hidden> wrote:
> Hi all,

> so I have this monstrosity [note: I know, there are much worse ones,
> too!]:

> "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'"

> (it's in the org-latex--script-size function in ox-latex.el, if you're
> curious).

> I'm not asking ?what does this match? ? I can read it myself.  But it
> comes with a considerable effort.  Are you aware of any tools that might
> help to understand such regexen?

> I know about re-builder, but it?s well suited for constructing a regex
> matching a given string, not the other way round.

> For instance, show-paren-mode does not really help here, since it seems
> to pair ?\\(? with unescaped ?)?.

> Any ideas?

I wrote myself the following tool.  It's not production quality, but you
might find it useful nonetheless.  To use it, Type

     M-: (pp-regexp re-horror).

It displays the regexp at the end of the *scratch* buffer, dropping the
contents of any \(..\) construct by one line.  I find it useful.  So might
you.  Feel free to adapt it, or pass it on to other people.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun pp-regexp (regexp)
  "Pretty print a regexp.  This means, contents of \\\\\(s are lowered a line."
  (or (stringp regexp) (error "parameter is not a string."))
  (let ((depth 0)
        (re (replace-regexp-in-string
             "[\t\n\r\f]"
             (lambda (s)
               (or (cdr (assoc s '(("\t" . "??")
                                   ("\n" . "??")
                                   ("\r" . "??"))))
                   "??"))
             regexp))
        (start 0)     ; earliest position still without an acm-depth property.
        (pos 0)       ; current analysis position.
        (max-depth 0) ; How many lines do we need to print?
        (min-depth 0) ; Pick up "negative depth" errors.
        pr-line       ; output line being constructed
        line-no ; line number of pr-line, varies between min-depth and 
max-depth.
        ch
        )
    ;(translate-rnt re)
    ;; apply acm-depth properties to the whole string.
    (while (< start (length re))
      (setq pos (string-match ;; "\\\\\\((\\(\\?:\\)?\\||\\|)\\)"
                 "\\\\\\(\\\\\\|(\\(\\?:\\)?\\||\\|)\\)"
                                  re start))
      (put-text-property start (or pos (length re)) 'acm-depth depth re)
      (when pos
        (setq ch (aref (match-string 1 re) 0))
        (cond
         ((eq ch ?\\)
          (put-text-property pos (match-end 1) 'acm-depth depth re))
         ((eq ch ?\()
          (put-text-property pos (match-end 1) 'acm-depth depth re)
          (setq depth (1+ depth))
          (if (> depth max-depth) (setq max-depth depth)))

         ((eq ch ?\|)
          (put-text-property pos (match-end 1) 'acm-depth (1- depth) re)
          (if (< (1- depth) min-depth) (setq min-depth (1- depth))))

         (t                             ; (eq ch ?\))
          (setq depth (1- depth))
          (if (< depth min-depth) (setq min-depth depth))
          (put-text-property pos (match-end 1) 'acm-depth depth re))))
      (setq start (if pos (match-end 1) (length re))))

    ;; print out the strings
    (setq line-no min-depth)
    (while (<= line-no max-depth)
      (with-current-buffer "*scratch*"
        (goto-char (point-max)) (insert ?\n)
        (setq pr-line "")
        (setq start 0)
        (while (< start (length re))
          (setq pos (next-single-property-change start 'acm-depth re (length 
re)))
          (setq depth (get-text-property start 'acm-depth re))
          (setq pr-line
                (concat pr-line
                        (if (= depth line-no)
                            (substring re start pos)
                          (make-string (- pos start) ?\ ))))
          (setq start pos))
        (insert pr-line)
        (setq line-no (1+ line-no))))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

> (Note: if there are no such tools, I might be tempted to craft one.  Two
> things that come to my mind are proper highlighting of matching parens
> of various kinds and eldoc-like hints for all the regex constructs ?
> I never seem to remember what does ?\\`? do, for instance.  Also,
> displaying the string with single backslashes and not in the way it is
> actually typed in in Elisp, with all the backslash escaping, might be
> helpful.  Would there be a demand for such a tool larger than one
> person?)

> Best,

> -- 
> Marcin Borkowski
> http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
> Faculty of Mathematics and Computer Science
> Adam Mickiewicz University

-- 
Alan Mackenzie (Nuremberg, Germany).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]