help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: line-spanning regexp


From: Bingham, Jay
Subject: RE: line-spanning regexp
Date: Wed, 15 Jan 2003 12:00:36 -0600

On Tuesday, January 14, 2003 7:47 PM Greg Hill Wrote
>
>
>At 3:59 PM -0800 1/14/03, Tennis Smith wrote:
>>Hi,
>>
>>How do I construct a regexp that looks for two strings that *might*
span
>>two consecutive lines? 
>>
>>For example, I need a regexp that will find string1 and string2 and
>>everything in between for the following scenarios:
>>
>>
>>blah blah blah blah string1 blah blah string2 blah blah blah
>>
>>-OR-
>>
>>blah blah string1 blah
>>string2 blah blah
>>
>>TIA,
>>-Tennis
>
>"string1[^\n]*[\n]?[^\n]*string2"
>

The above pattern for a regexp may NOT work in all circumstances.
Specifically it may not work correctly when used in interactive regular
expression searches (isearch-forward-regexp, C-M-S;
isearch-backward-regexp, C-M-r; search-forward-regexp and
search-backward-regexp).  The reason that it may not work is that the
escaped sequences \n and \t when entered in an interactive regexp DO NOT
match newline and tab, although the Search -> Regexp Search info node
does not mention this restriction and information contained at the
Search -> Regexps info node might be interpreted as indication that they
do.

However, in the example given by the OP it works, but not for the reason
that one might think.  It works in this case because the expression
[^\n] will match anything that is not a "\" or an "n", since a newline
is not a backslash or the letter "n" it will match in either the first
instance or the second instance of the [^\n]* as long as there is a
backslash or an "n" in the text that that occurs between the start of
string1 and end of string2.  Change "string" to "text" in the buffer and
the pattern will no longer match.

The correct regexp (that does not depend on the presence of an n or \)
to use in interactive searches is (as typed to enter it):

"string1[^C-qC-j]*[C-qC-j]?[^C-qC-j]*string2"

This will produce a string that looks like this when displayed:

"string1[^^J]*[^J]?[^^J]*string2"

The pattern suggested by Greg may also produce undesired results when
the following condition exists in the buffer:

blah blah string1 blah string2 blah blah blah
blah blah string1 blah blah string2 blah blah blah

In this case it will match from the start of string1 on first line to
the end of string2 on the second line.  If this is not the desired
result the regexp can be modified to match the shortest rather than the
longest string.  In Emacs 21.1 and later versions the regexp to do this
is:

"string1[^\n]*?[\n]?[^\n]*?string2"

Earlier versions of Emacs require a different construct, the regexp to
use in those versions is:

"string1\\(\\|[^\n]\\)*[\n]?\\(\\|[^\n]\\)*string2"

See http://www.emacswiki.org/cgi-bin/wiki.pl?NonGreedyRegexp for more
information.

Happy emacsing
-_
J_)
C_)ingham
.    HP - NonStop Austin Software & Services - Software Quality
Assurance
.    Austin, TX
. "Language is the apparel in which your thoughts parade in public.
.  Never clothe them in vulgar and shoddy attire."     -Dr. George W.
Crane-





reply via email to

[Prev in Thread] Current Thread [Next in Thread]