[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unquoted special characters in regexps
From: |
martin rudalics |
Subject: |
Re: Unquoted special characters in regexps |
Date: |
Sat, 04 Mar 2006 10:58:37 +0100 |
User-agent: |
Mozilla Thunderbird 1.0 (Windows/20041206) |
I believe that you make understanding regexps hard on yourself by
making all kind of assumptions that often are not satisfied.
There is no reason why a literal `]' should be matched by a literal
`[' to the right or vice versa.
What I meant was that
(i) when I see a literal `[' I expect it to be matched by a literal `]'
in the text that follows and,
(ii) when I see a literal `]' I expect it to be matched by a literal `['
in the preceding text.
In mathematics open intervals like `]3,5]' are an obvious exception to
these rules but in general I've been quite happy with them. In the
particular case, I've been talking about a regexp in Emacs source
"\\(\[[0-9]+\] \\)*\\([a-zA-Z0-9.$_]+\\)\\.[a-zA-Z0-9$_<>(),]+ \
\\(([a-zA-Z0-9.$_]+:\\|line=\\)\\([0-9.,]+\\)"
which I consider wrong. Apparently that part of the code is never taken
thus no one has complained so far about mismatches. However, similar
expressions to match line numbers occur frequently. And I use the rules
above to reason about them and am confident that in this particular case
you use one of these rules as well.
If I followed your reasoning to its logical end I couldn't possibly rule
out malformed regexps like `[a-z'. After all the `[' states that a
character alternative starts here, why should a user bother to close it?
Even _if_ the `[' and the `]' balance
in the text you are parsing through _considered in its entirety_
(which is not at all guaranteed), you might be inside, say, a nested
Lisp vector and your regexp may be searching for its end. No balance
of literal `[' and `]' at all. This is _not_ an exceptional
situation. It occurs all over the place in the Emacs source code.
I fully agree. However, in such cases there is practically always some
pdl (variable) to record the current state of "unclosed" literal `['s.
In practice, I will complain about unmatching brackets when either the
pdl is empty (the variable is zero) and I find a literal `]' or the pdl
is non-empty (the variable is non-zero) when I encounter the end of the
text. Hence, the pdl (variable) compensates missing symmetry in the
part of the text I want to parse.
- Re: Unquoted special characters in regexps, (continued)
- Re: Unquoted special characters in regexps, Luc Teirlinck, 2006/03/03
- Re: Unquoted special characters in regexps, Luc Teirlinck, 2006/03/04
- Re: Unquoted special characters in regexps, Richard Stallman, 2006/03/04
- Re: Unquoted special characters in regexps, martin rudalics, 2006/03/03
- Re: Unquoted special characters in regexps, Richard Stallman, 2006/03/04
- Re: Unquoted special characters in regexps, martin rudalics, 2006/03/04
- Re: Unquoted special characters in regexps, Richard Stallman, 2006/03/05
- Re: Unquoted special characters in regexps, Richard Stallman, 2006/03/04
- Re: Unquoted special characters in regexps, martin rudalics, 2006/03/04
- Re: Unquoted special characters in regexps, Luc Teirlinck, 2006/03/03
- Re: Unquoted special characters in regexps,
martin rudalics <=
- Re: Unquoted special characters in regexps, Luc Teirlinck, 2006/03/04
- Re: Unquoted special characters in regexps, Luc Teirlinck, 2006/03/04
- Re: Unquoted special characters in regexps, Richard Stallman, 2006/03/05
- Re: Unquoted special characters in regexps, Luc Teirlinck, 2006/03/04
- Re: Unquoted special characters in regexps, martin rudalics, 2006/03/04
- Re: Unquoted special characters in regexps, Luc Teirlinck, 2006/03/04
- Re: Unquoted special characters in regexps, martin rudalics, 2006/03/01
- Re: Unquoted special characters in regexps, Andreas Schwab, 2006/03/02
- Re: Unquoted special characters in regexps, martin rudalics, 2006/03/01
- Re: Unquoted special characters in regexps, Andreas Schwab, 2006/03/02