bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grep -e '\(a\)\1' -e '\(b\)\1'


From: Tom Lord
Subject: Re: grep -e '\(a\)\1' -e '\(b\)\1'
Date: Sun, 18 Feb 2001 23:52:13 -0800 (PST)

        For example, should the following shell command
        output nothing, or output a line containing "b"?

                echo 'b' | grep '\(\(a\)\)*\2'

1003.2-1992 and the latest draft appear to be identical on
this point.

Section 2.8.3.3(3) of 1003.2-1992 makes it unambiguously clear that
that grep should output nothing: "b" does not match the pattern.  Nor
does the pattern match the empty string.  

        The back-reference [...] shall match the same (possibly
        empty) string of characters as was matched [by the corresponding
                                    ^^^^^^^^^^^^^^
        subexpression].

In B.5.2, the question of "what is matched by a parenthesized subexpression"
is addressed.  In that section, a parenthesized expression enclosed in
a "*" expression which matches zero times is said to "not participate
in the match" -- i.e., the parenthesized expression does not match
any string, even an empty string.  Thus, the back-reference can not match
any string at all (even an empty string).

Another way to see this is to consider the `pmatch' output of regexec.
When matching \(\(a\)\)* against the string "b", you must get:

        pmatch[2].rm_so == pmatch[2].rm_eo == -1

There is no substring of characters, even an empty string, in the
string "b" beginning at position -1.  Therefore, after \(\(a\)\)*
matches the empty string, \2 can not match anything at all.

        Paul Eggert writes:
        In the discussion of BRE back-references, the latest POSIX draft says:
        [....same material quoted above...]
        In my opinion, this does not define the behavior of \(\(a\)\)*\2 when
        the \(a\) never matched a string.

The text you quoted unambiguously does define the correct behavior.
Other sections, for example B.5.2 of 1003.2-1992, make this clear.
In this case: GNU grep is right; Solaris xpg4 is wrong.

        I.e. even though GNU grep
        and Solaris xpg4 grep act differently here, they both conform to
        POSIX. 

No.  Solaris xpg4 (if it behaves as reported here) does not conform 
to the Posix specification -- though I doubt any Posix test suites
check for this particular case.

The POSIX regexp standard is not that loose.   It leaves some
constructs unspecified (e.g., backreferences in ERE).  It specifies
all other cases completely and unambiguously.

The alternative interpretation you propose would lead to some odd
results.

Thomas Lord
regexps.com



reply via email to

[Prev in Thread] Current Thread [Next in Thread]