emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#18266: closed (grep -P and invalid exits with erro


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#18266: closed (grep -P and invalid exits with error )
Date: Thu, 11 Sep 2014 17:08:02 +0000

Your message dated Thu, 11 Sep 2014 10:07:49 -0700
with message-id <address@hidden>
and subject line Re: bug#18266: Bug#758105: bug#18266: grep -P and invalid 
exits with error
has caused the debbugs.gnu.org bug report #18266,
regarding grep -P and invalid exits with error 
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
18266: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18266
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: grep -P and invalid exits with error Date: Thu, 14 Aug 2014 17:42:57 +0200 User-agent: Mutt/1.5.23 (2014-03-12)
Hi,

Please, revert ca7868cc27db3d9deafaa2e0ac5a2bb0aa8ef373

That commit (re)introduced a regression bug (See http://debbugs.gnu.org/15758).
pcresearch checks again if input is UTF-8 valid. The problem is that
binary files are utf-8 invalid, so grep -P, in unicode locales, exits
with error:

LANG=en_US.UTF-8 grep -P -r x /usr/bin/
grep: invalid UTF-8 byte sequence in input



printf 'j\x82\nj\n'|LC_ALL=en_US.UTF-8 grep -P j|cat -A; echo $?
grep: invalid UTF-8 byte sequence in input
0

should be:
printf 'j\x82\nj\n'|LC_ALL=en_US.UTF-8 src/grep -P j|cat -A; echo $?
jM-^B$
j$
0

Tested on Debian and Archlinux with pcre 8.35.

Thanks,

Santiago




--- End Message ---
--- Begin Message --- Subject: Re: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error Date: Thu, 11 Sep 2014 10:07:49 -0700 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0
Vincent Lefevre wrote:

I've just reported a new Debian concerning the performance problem.

It's not clear from http://bugs.debian.org/761157 that the performance problem occurs only with -P, but I assume that's what is meant.

Since this is a performance bug with PCRE, I suggest moving the Debian bug report to the Debian libpcre3 package. Grep cannot go back to the old way, which could cause grep to crash, and the bug cannot be fixed in grep because libpcre3 does not provide a fast way to search arbitrary data that may include encoding errors. It really is a problem that requires changes to libpcre3 to fix; grep cannot fix it.

In the meantime, in order to use 'grep' to search for strings in arbitrary data, I suggest omitting the '-P'. Also, I suggest using the C locale.

As the GNU bug 18266 "grep -P and invalid exits with error" has been fixed, I'm closing that bug report. Please feel free to open a separate GNU bug report for the performance issue.

PS. While composing this email I noticed another bug in grep -P and encoding errors, which I fixed by installing the attached patch.

Attachment: 0001-grep-fix-false-matches-with-P-.-and-invalid-UTF-8.patch
Description: Text document


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]