bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16865: grep -wP and backreferences


From: Stephane Chazelas
Subject: bug#16865: grep -wP and backreferences
Date: Mon, 24 Feb 2014 21:20:01 +0000
User-agent: Mutt/1.5.21 (2010-09-15)

Fine by me, thanks.

BTW, as discussed in another bug, the -w/-x invalidate the
(*UCP) and other PCRE special sequences. Chances are we can't
easily do much about it, but it may still be worth documenting.

Like, one should use

grep -P '(*UCP)\bword\b'

as

grep -wP '(*UCP)word'

won't work (pcregrep has the same problem).

In another bug, I've seen someone commenting that

grep -wP 'a)(b'

doesn't give the error message that one would expect (not that
I'd expect anyone would care).

A last note: with -w, pcregrep wraps the regexp in \b...\b
instead of \b(?:...)\b, so it could be that those brackets are
not necessary in the first place.

Sorry I lied, it was not the last note ;-). Note the difference:

$ echo a@@b | grep -w @@
$ echo a@@b | grep -Pw @@
a@@b


Maybe instead of \b(?:...)\b, we could use (?<!\w)...(?!\w)

$ echo a%%b | grep -P '(?<!\w)%%(?!\w)'
$ echo %aa% | grep -P '(?<!\w)aa(?!\w)'
%aa%



Full text of original email included for reference:

2014-02-24 12:00:08 -0800, Jim Meyering:
> On Mon, Feb 24, 2014 at 2:01 AM, Stephane Chazelas
> <address@hidden> wrote:
> > Hello,
> >
> > Backreferences don't work with -w or -x in combination with -P:
> >
> > $ echo aa | grep -Pw '(.)\1'
> > $
> >
> > Or they work in an unexpected way:
> >
> > $ echo aa | grep -Pw '(.)\2'
> > aa
> >
> > The fix is simple:
> >
> >
> > --- src/pcresearch.c~   2014-02-24 09:59:56.864374362 +0000
> > +++ src/pcresearch.c    2014-02-24 07:33:04.666398105 +0000
> > @@ -75,9 +75,9 @@ Pcompile (char const *pattern, size_t si
> 
> Thanks a lot for the patch.
> I've converted it to a proper commit with NEWS and a test case.
> Please ack the attached if it's all ok with you (you're still the "Author:"):

> From bfd21931b3cd088d642a190e9f030214df04045d Mon Sep 17 00:00:00 2001
> From: Stephane Chazelas <address@hidden>
> Date: Mon, 24 Feb 2014 11:54:09 -0800
> Subject: [PATCH] grep -P: fix it so backreferences now work with -w and -x
> 
> To implement -w and -x, we bracket the search term with parentheses.
> However, that set of parentheses had the default semantics of
> "capturing", i.e., creating a backreferenceable matched quantity.
> Instead, use (?:...), to create a non-capturing group.
> * src/pcresearch.c (Pcompile): Use (?:...) rather than (...).
> * NEWS (Bug fixes): Mention it.
> * tests/pcre-wx-backref: New file.
> * tests/Makefile.am (TESTS): Add it.
> ---
>  NEWS                  |  6 ++++++
>  src/pcresearch.c      |  4 ++--
>  tests/Makefile.am     |  1 +
>  tests/pcre-wx-backref | 28 ++++++++++++++++++++++++++++
>  4 files changed, 37 insertions(+), 2 deletions(-)
>  create mode 100755 tests/pcre-wx-backref
> 
> diff --git a/NEWS b/NEWS
> index 771fd80..49fe984 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -2,6 +2,12 @@ GNU grep NEWS                                    -*- outline 
> -*-
> 
>  * Noteworthy changes in release ?.? (????-??-??) [?]
> 
> +** Bug fixes
> +
> +  grep -P now works with -w and -x and backreferences. Before,
> +  echo aa|grep -Pw '(.)\1' would fail to match, yet
> +  echo aa|grep -Pw '(.)\2' would match.
> +
> 
>  * Noteworthy changes in release 2.18 (2014-02-20) [stable]
> 
> diff --git a/src/pcresearch.c b/src/pcresearch.c
> index 5b5ba3e..d4a20ff 100644
> --- a/src/pcresearch.c
> +++ b/src/pcresearch.c
> @@ -75,9 +75,9 @@ Pcompile (char const *pattern, size_t size)
> 
>    *n = '\0';
>    if (match_lines)
> -    strcpy (n, "^(");
> +    strcpy (n, "^(?:");
>    if (match_words)
> -    strcpy (n, "\\b(");
> +    strcpy (n, "\\b(?:");
>    n += strlen (n);
> 
>    /* The PCRE interface doesn't allow NUL bytes in the pattern, so
> diff --git a/tests/Makefile.am b/tests/Makefile.am
> index 4ffea85..ecbe0e6 100644
> --- a/tests/Makefile.am
> +++ b/tests/Makefile.am
> @@ -83,6 +83,7 @@ TESTS =                                             \
>    pcre-abort                                 \
>    pcre-invalid-utf8-input                    \
>    pcre-utf8                                  \
> +  pcre-wx-backref                            \
>    pcre-z                                     \
>    prefix-of-multibyte                                \
>    r-dot                                              \
> diff --git a/tests/pcre-wx-backref b/tests/pcre-wx-backref
> new file mode 100755
> index 0000000..643aa9b
> --- /dev/null
> +++ b/tests/pcre-wx-backref
> @@ -0,0 +1,28 @@
> +#! /bin/sh
> +# Before grep-2.19, grep -P and -w/-x would not with a backreference.
> +#
> +# Copyright (C) 2014 Free Software Foundation, Inc.
> +#
> +# Copying and distribution of this file, with or without modification,
> +# are permitted in any medium without royalty provided the copyright
> +# notice and this notice are preserved.
> +
> +. "${srcdir=.}/init.sh"; path_prepend_ ../src
> +require_pcre_
> +
> +echo aa > in || framework_failure_
> +echo 'grep: reference to non-existent subpattern' > exp-err \
> +  || framework_failure_
> +
> +fail=0
> +
> +for xw in x w; do
> +  grep -P$xw '(.)\1' in > out 2>&1 || fail=1
> +  compare out in || fail=1
> +
> +  grep -P$xw '(.)\2' in > out 2> err && fail=1
> +  compare /dev/null out || fail=1
> +  compare exp-err err || fail=1
> +done
> +
> +Exit $fail
> -- 
> 1.9.0
> 






reply via email to

[Prev in Thread] Current Thread [Next in Thread]