[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unquoted special characters in regexps

From: Luc Teirlinck
Subject: Re: Unquoted special characters in regexps
Date: Mon, 27 Feb 2006 18:30:13 -0600 (CST)

None of the messages I sent on this (or on anything else) in the last
few days made it to emacs-devel, although all other people's
responses did, be it after some delay.  I just got messages saying
that local delivery failed.  So I will have to repeat some things that
I already said before.

Richard Stallman wrote:

   However, that doesn't necessarily mean the manual is wrong.
   There is more than one way to understand the word "special".
   At the most literal level, ] is not special; if you write it
   without \\, the regexp compiler won't misunderstand it.

`]', like `-' are only special in the context of a character
alternative, that is if, before you type them, you are in a character
alternative.   By contrast, `['  and all other special characters
(except `^') are  only special outside that context.

All characters that are special outside character alternatives are
never special if you precede them with a backslash.  This is true even
for `^'.  This is why it is good to precede them with a backslash even
if they are not special.  That way, the reader can see that they are
not special, without studying the regexp.

On the other hand, a backslash, _never_ eliminates the special meaning
of a `]' or `-' with a special meaning. 

There are two questions here.  Whether a `]' outside a character
alternative should be quoted or not and whether any changes to the
Elisp manual are required.  In this posting, I will only discuss the

First of all, there are (surprisingly) many occurrences of "\\]" in
the Emacs source, where the `]' _is_ special and closes a character
alternative that contains a slash.  Reportedly quoting a `]' with a
backslash _inside_ a character alternative works in some other regexp
implementations such as AWK.  So if I see "\\]" I have to worry about
three possibilities:  it might deliberately close a character
alternative which includes a slash, it might do so by accident because
the author tried to quote a `]' inside a character alternative (and
hence the regexp is buggy), or it might be a deliberately quoted `]'
outside a character alternative.

If I see `]' without preceding "\\", I only have to worry about
whether or not it closes a character alternative, and not about the
third possibility of a bug.

In summary I believe that quoting a `]' outside a character
alternative only adds clutter and a third possibility to worry about.

There are places in the Emacs code that quote a `]' outside a
character alternative.  Even if we decide that this is undesirable, I
do not fancy finding and changing them all.  But we could change the
behavior of `regexp-quote' and `regexp-opt' which currently quote
such `]'.  That could be done with the following trivial patch, which
I could install if that is what we decide to do:

===File ~/search.c-diff=====================================
*** search.c    06 Feb 2006 16:02:24 -0600      1.206
--- search.c    27 Feb 2006 00:16:42 -0600      
*** 3066,3072 ****
    for (; in != end; in++)
!       if (*in == '[' || *in == ']'
          || *in == '*' || *in == '.' || *in == '\\'
          || *in == '?' || *in == '+'
          || *in == '^' || *in == '$')
--- 3066,3072 ----
    for (; in != end; in++)
!       if (*in == '['
          || *in == '*' || *in == '.' || *in == '\\'
          || *in == '?' || *in == '+'
          || *in == '^' || *in == '$')

reply via email to

[Prev in Thread] Current Thread [Next in Thread]