|
From: | Mabry Tyson |
Subject: | grep backreference seems to invalidate --ignore-case |
Date: | Mon, 19 Dec 2005 02:07:25 -0800 |
User-agent: | Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.12) Gecko/20050915 |
To make sure this hasn't been recently fixed, I downloaded ftp://ftp.gnu.org/gnu/grep/grep-2.5.1a.tar.gz
and built grep from that.
manresa 181: uname -a SunOS manresa 5.8 Generic_108528-24 sun4u sparc SUNW,Sun-Blade-100 manresa 182: src/grep --version grep (GNU grep) 2.5.1 Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NOwarranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.manresa 183: cat /tmp/test A abcd abcd manresa 184: src/grep --ignore-case 'a \(abcd\) \1' /tmp/test manresa 185: src/grep --ignore-case 'A \(abcd\) \1' /tmp/test A abcd abcd manresa 186: src/grep --ignore-case 'A \(ABCD\) \1' /tmp/test manresa 187: src/grep --ignore-case 'A \(ABCD\) ABCD' /tmp/test A abcd abcd manresa 188: src/grep --ignore-case 'a \(ABCD\) ABCD' /tmp/test A abcd abcd
It is my belief that all of these calls to grep should have returned the line from the file.
The grep distributed with Solaris 8 acts as I expect
manresa 192: /usr/bin/grep -i 'a \(abcd\) \1' /tmp/test A abcd abcd
Another test case:
manresa 51: cat /tmp/test2 a abcd aBcD manresa 52: src/grep --ignore-case 'a \(abcd\) \1' /tmp/test2 manresa 53: /usr/bin/grep -i 'a \(abcd\) \1' /tmp/test2 a abcd aBcD
In this case, however, the documentation is somewhat ambiguous. --ignore-case is documented as "Ignore case distinctions in both the PATTERN and the input files." A backreference is documented as "matches the substring previously matched by the Nth parenthesized subexpression of the regular expression." It isn't clear whether a backreference must match the substring exactly, or possibly match it, ignoring case. It appears that at least the grep used in Solaris matched the substring, ignoring case if --ignore-case is also given. I would argue that this is the correct behavior as the --ignore-case indicates to ignore the case in the input files. However this is resolved, the documentation should clarify what it does.
It appears that GNU emacs 21.12.1 (on Mac OS X) does regular expression matching as I expect. When case-fold-search = t, the expression
(search-forward-regexp "a \\(abcd\\) \\1")
will match each of the lines
a abcd abcd A abcd abcd a abcd aBcD
[Prev in Thread] | Current Thread | [Next in Thread] |