bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16871: problems about matching newline (with -z)


From: Stephane Chazelas
Subject: bug#16871: problems about matching newline (with -z)
Date: Tue, 25 Feb 2014 07:32:18 +0000
User-agent: Mutt/1.5.21 (2010-09-15)

The doc has a confusing statement:

> 15. How can I match across lines?
>
>    Standard grep cannot do this, as it is fundamentally line-based.
>    Therefore, merely using the '[:space:]' character class does not
>    match newlines in the way you might expect.  However, if your grep
>    is compiled with Perl patterns enabled, the Perl 's' modifier
>    (which makes '.' match newlines) can be used:
>
>         printf 'foo\nbar\n' | grep -P '(?s)foo.*?bar'
>
>    With the GNU 'grep' option '-z' (*note File and Directory
>    Selection::), the input is terminated by null bytes.  Thus, you can
>    match newlines in the input, but the output will be the whole file,
>    so this is really only useful to determine if the pattern is
>    present:
>
>         printf 'foo\nbar\n' | grep -z -q 'foo[[:space:]]\+bar'
>
>    Failing either of those options, you need to transform the input
>    before giving it to 'grep', or turn to 'awk', 'sed', 'perl', or
>    many other utilities that are designed to operate across lines.

printf 'foo\nbar\n' | grep -P '(?s)foo.*?bar'

Will never match as it's line-based even with -P. -P doesn't
help here, it makes it harder as you need that (?s).

printf 'foo\nbar\n\0' | grep -z 'foo.*bar'

would match.

Same confusion in tests/pcre:

> #! /bin/sh
> # Ensure that with -P, \s*$ matches a newline.
> #
> # Copyright (C) 2001, 2006, 2009-2014 Free Software Foundation, Inc.
> #
> # Copying and distribution of this file, with or without modification,
> # are permitted in any medium without royalty provided the copyright
> # notice and this notice are preserved.
> 
> . "${srcdir=.}/init.sh"; path_prepend_ ../src
> require_pcre_
> 
> fail=0
> 
> # See CVS revision 1.32 of "src/search.c".
> echo | grep -P '\s*$' || fail=1
> 
> Exit $fail

'\s*$' doesn't match a newline, but an empty string.

You need echo | grep -zP '\s' to match the newline.

Also:

We can match a newline with grep -zP 'a\nb' (or '\x0a' or '\012'
or '[\n]'...) but not easily without -P. Same for NUL
characters.

Without -P, the only way I could think of was with
[^\0-\011\013-\377], but that would only work for single-byte
locales, and you can't pass a nul character on the command line,
so it would have to be with -f but:

$ printf 'a\nb\0' | LC_ALL=C grep -zf <(LC_ALL=C printf 'a[^\0-\011\013-\377]b')
zsh: done                printf 'a\nb\0' |
zsh: segmentation fault  LC_ALL=C grep -zf <(LC_ALL=C printf 
'a[^\0-\011\013-\377]b')

Having said that:

grep -z $'a[^\01-\011\013-\0377]b'

would work (in single-byte locales) since nul is not in the
input since it's the delimiter.

and grep -a $'[^\01-\0377]' can match nul (in single-byte
locales).

But it would be handly to be able to do the same as with -P.

-- 
Stephane





reply via email to

[Prev in Thread] Current Thread [Next in Thread]