emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unquoted special characters in regexps


From: martin rudalics
Subject: Re: Unquoted special characters in regexps
Date: Sun, 05 Mar 2006 12:54:14 +0100
User-agent: Mozilla Thunderbird 1.0 (Windows/20041206)

Luc Teirlinck wrote:
> Also, forward and backward views of a regexp are not
> algorithmically equivalent.  If you read a regexp forward, you know
> immediately when you encounter a character whether it has to be taken
> literally or not (or at worst after a _very_ limited number of
> characters, as the second `[' in in "[[:...").  If you read the regexp
> backward, you may have to read all the way back to the beginning
> before you can be sure that a `]' is to be taken literally.

How do you read the following regexp from `cc-langs.el'?

(concat
 "\\("
 "[\)\[\(]"
 (if (c-lang-const c-type-modifier-kwds)
     (concat
      "\\|"
      ;; "throw" in `c-type-modifier-kwds' is followed
      ;; by a parenthesis list, but no extra measures
      ;; are necessary to handle that.
      (regexp-opt (c-lang-const c-type-modifier-kwds) t)
      "\\>")
   "")
 "\\)")

Do you really evaluate the (c-lang-const ...)s _before_ looking at the
closing `\\)'?  What would you do if the value of `c-type-modifier-kwds'
were available at run-time only?

When trying to understand such regexps I break them up into parts first.
Such parts are, in my understanding, groups like `\\(...\\)',
subexpressions delimited by `\\|', and character alternatives.  Next I
try to understand the parts that interest me without paying notice to
parts that do not relate to my specific problem.  And I would have
troubles to isolate a character alternative when the author matches a
literal right bracket with `]'.

People can make reading a regexp truly awkward by writing kludgy
expressions like

(let ((keywords (concat "\\([;(){}`|&]\\|^\\)[ \t]*\\(\\("
                        (regexp-opt (sh-feature sh-leading-keywords) t)
                        "[ \t]+\\)?"
                        (regexp-opt (append (sh-feature sh-leading-keywords)
                                            (sh-feature sh-other-keywords))
                                    t))))

in `sh-font-lock-keywords-1' which I understand correctly iff I read the
definition of the entire function first.  Such expressions are, however,
rare in present Emacs code.

> Hence, reading a regexp forward _is_ algorithmically _very_ superior
> over reading it backward if your purpose is to understand the regexp.

If my purpose is to understand how a regexp engine interprets a regexp,
reading a regexp forwardly is superior.  If, however, my purpose is to
understand a complex regexp I want to guess the author's intentions
first.  In that case I do want to break up the expression into its
constituents.  In general, languages hiding implementation details are
easier to use than languages that require users to know how specific
features are implemented.

> I must admit however, that if you want is to uncover the subliminal
> satanic messages in the regexp, then you _have_ to read it backward.

It's better to avoid "subliminal satanic messages" when _writing_ a
regexp.  It's bad if you have to uncover them when reading a regexp.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]