|
From: | GNU bug Tracking System |
Subject: | [debbugs-tracker] bug#22028: closed (grep -Pc / grep -P | wc -l inconsistent results) |
Date: | Thu, 31 Dec 2015 07:28:01 +0000 |
Your message dated Wed, 30 Dec 2015 23:27:37 -0800 with message-id <address@hidden> and subject line Re: grep -Pc / grep -P | wc -l inconsistent results has caused the debbugs.gnu.org bug report #22028, regarding grep -Pc / grep -P | wc -l inconsistent results to be marked as done. (If you believe you have received this mail in error, please contact address@hidden) -- 22028: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=22028 GNU Bug Tracking System Contact address@hidden with problems
--- Begin Message ---Subject: grep -Pc / grep -P | wc -l inconsistent results Date: Fri, 27 Nov 2015 06:29:31 -0500 (EST) Hi, it seems for long files which starts with non binary data and if PCRE matcher is used, grep works in TEXTBIN_UNKNOWN mode until it finds binary data, then it switches to TEXTBIN_BINARY. But in -Pc mode in TEXTBIN_BINARY it exits on next match causing bogus -Pc results. Reproducer: $ grep -P -c 'Blocked by (SpamAssassin|Spamfilter)' ./filtered.txt 1 $ grep -P 'Blocked by (SpamAssassin|Spamfilter)' ./filtered.txt | wc -l 2 The ./filtered.txt is long enough text file, that contains some NULLs after the first 32kB text, e.g. https://bugzilla.redhat.com/attachment.cgi?id=1080646 Original downstream bugzilla: https://bugzilla.redhat.com/attachment.cgi?id=1080646 Attached is my attempt to fix it, but it may be not the right way how to fix it. Especially the question is whether it should stop when it finds binary data or not. But at least the grep -Pc / grep -P | wc -l should behave the same thanks & regards Jaroslav0001-grep-do-not-stop-on-binary-data-if-counting-in-PCRE.patch
Description: Text Data
--- End Message ---
--- Begin Message ---Subject: Re: grep -Pc / grep -P | wc -l inconsistent results Date: Wed, 30 Dec 2015 23:27:37 -0800 Thanks for the bug report and fix, Jaroslav. And thanks, Norihiro, for the test case; I think I independently came up with something similar to your grep.c fix in my earlier patches today and so I expect that part of your changes are no longer needed. I installed the attached combined patch for this bug and am marking it as done. User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 0001-grep-c-should-keep-counting-after-binary-data.patch
Description: Text Data
--- End Message ---
[Prev in Thread] | Current Thread | [Next in Thread] |