emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unquoted special characters in regexps


From: martin rudalics
Subject: Re: Unquoted special characters in regexps
Date: Sat, 04 Mar 2006 10:58:37 +0100
User-agent: Mozilla Thunderbird 1.0 (Windows/20041206)

I believe that you make understanding regexps hard on yourself by
making all kind of assumptions that often are not satisfied.

There is no reason why a literal `]' should be matched by a literal
`[' to the right or vice versa.

What I meant was that

(i) when I see a literal `[' I expect it to be matched by a literal `]'
in the text that follows and,

(ii) when I see a literal `]' I expect it to be matched by a literal `['
in the preceding text.

In mathematics open intervals like `]3,5]' are an obvious exception to
these rules but in general I've been quite happy with them.  In the
particular case, I've been talking about a regexp in Emacs source

         "\\(\[[0-9]+\] \\)*\\([a-zA-Z0-9.$_]+\\)\\.[a-zA-Z0-9$_<>(),]+ \
\\(([a-zA-Z0-9.$_]+:\\|line=\\)\\([0-9.,]+\\)"

which I consider wrong.  Apparently that part of the code is never taken
thus no one has complained so far about mismatches.  However, similar
expressions to match line numbers occur frequently.  And I use the rules
above to reason about them and am confident that in this particular case
you use one of these rules as well.

If I followed your reasoning to its logical end I couldn't possibly rule
out malformed regexps like `[a-z'.  After all the `[' states that a
character alternative starts here, why should a user bother to close it?

Even _if_ the `[' and the `]' balance
in the text you are parsing through _considered in its entirety_
(which is not at all guaranteed), you might be inside, say, a nested
Lisp vector and your regexp may be searching for its end.  No balance
of literal `[' and `]' at all.  This is _not_ an exceptional
situation.  It occurs all over the place in the Emacs source code.

I fully agree.  However, in such cases there is practically always some
pdl (variable) to record the current state of "unclosed" literal `['s.
In practice, I will complain about unmatching brackets when either the
pdl is empty (the variable is zero) and I find a literal `]' or the pdl
is non-empty (the variable is non-zero) when I encounter the end of the
text.  Hence, the pdl (variable) compensates missing symmetry in the
part of the text I want to parse.








reply via email to

[Prev in Thread] Current Thread [Next in Thread]