[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#16871: problems about matching newline (with -z)
From: |
Stephane Chazelas |
Subject: |
bug#16871: problems about matching newline (with -z) |
Date: |
Tue, 25 Feb 2014 07:32:18 +0000 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
The doc has a confusing statement:
> 15. How can I match across lines?
>
> Standard grep cannot do this, as it is fundamentally line-based.
> Therefore, merely using the '[:space:]' character class does not
> match newlines in the way you might expect. However, if your grep
> is compiled with Perl patterns enabled, the Perl 's' modifier
> (which makes '.' match newlines) can be used:
>
> printf 'foo\nbar\n' | grep -P '(?s)foo.*?bar'
>
> With the GNU 'grep' option '-z' (*note File and Directory
> Selection::), the input is terminated by null bytes. Thus, you can
> match newlines in the input, but the output will be the whole file,
> so this is really only useful to determine if the pattern is
> present:
>
> printf 'foo\nbar\n' | grep -z -q 'foo[[:space:]]\+bar'
>
> Failing either of those options, you need to transform the input
> before giving it to 'grep', or turn to 'awk', 'sed', 'perl', or
> many other utilities that are designed to operate across lines.
printf 'foo\nbar\n' | grep -P '(?s)foo.*?bar'
Will never match as it's line-based even with -P. -P doesn't
help here, it makes it harder as you need that (?s).
printf 'foo\nbar\n\0' | grep -z 'foo.*bar'
would match.
Same confusion in tests/pcre:
> #! /bin/sh
> # Ensure that with -P, \s*$ matches a newline.
> #
> # Copyright (C) 2001, 2006, 2009-2014 Free Software Foundation, Inc.
> #
> # Copying and distribution of this file, with or without modification,
> # are permitted in any medium without royalty provided the copyright
> # notice and this notice are preserved.
>
> . "${srcdir=.}/init.sh"; path_prepend_ ../src
> require_pcre_
>
> fail=0
>
> # See CVS revision 1.32 of "src/search.c".
> echo | grep -P '\s*$' || fail=1
>
> Exit $fail
'\s*$' doesn't match a newline, but an empty string.
You need echo | grep -zP '\s' to match the newline.
Also:
We can match a newline with grep -zP 'a\nb' (or '\x0a' or '\012'
or '[\n]'...) but not easily without -P. Same for NUL
characters.
Without -P, the only way I could think of was with
[^\0-\011\013-\377], but that would only work for single-byte
locales, and you can't pass a nul character on the command line,
so it would have to be with -f but:
$ printf 'a\nb\0' | LC_ALL=C grep -zf <(LC_ALL=C printf 'a[^\0-\011\013-\377]b')
zsh: done printf 'a\nb\0' |
zsh: segmentation fault LC_ALL=C grep -zf <(LC_ALL=C printf
'a[^\0-\011\013-\377]b')
Having said that:
grep -z $'a[^\01-\011\013-\0377]b'
would work (in single-byte locales) since nul is not in the
input since it's the delimiter.
and grep -a $'[^\01-\0377]' can match nul (in single-byte
locales).
But it would be handly to be able to do the same as with -P.
--
Stephane
- bug#16871: problems about matching newline (with -z),
Stephane Chazelas <=