bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22655: grep -Pz '^' now fails!


From: Paul Eggert
Subject: bug#22655: grep -Pz '^' now fails!
Date: Sat, 19 Nov 2016 03:22:23 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0

Stephane Chazelas wrote:

I don't know the details of why it's done that way, but I'm not
sure I can see how calling pcre_exec that way can be quicker
than calling it on each individual line/record.

It can be hundreds of times faster in common cases. See:

http://git.savannah.gnu.org/cgit/grep.git/commit/?id=f6603c4e1e04dbb87a7232c4b44acc6afdf65fef

Note that this is still wrong:

$ printf 'a\nb\0' | ./src/grep -zxP a
a
b

Thanks, fixed by installing the attached.

Removing PCRE_MULTILINE (and get back to calling pcre_exec on
every record separately) would help except in the cases where the
user does:

grep -xzP '(?m)a'

I don't think grep can address this problem, as in general that would require interpreting the PCRE pattern at run-time and grep should not be delving into PCRE internals. Uses of (?m) lead to unspecified behavior in grep, and applications should not rely on any particular behavior in this area. This is firmly in the Perl tradition, as the Perl documentation for this part of the regular expression syntax says "The stability of these extensions varies widely. Some ... are experimental and may change without warning or be completely removed." Also, the grep manual says that -P "is highly experimental". User beware, that's all.

Attachment: 0001-grep-fix-zxP-bug.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]