[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#22838: closed (New 'Binary file' detection conside

From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#22838: closed (New 'Binary file' detection considered harmful)
Date: Fri, 09 Sep 2016 01:44:02 +0000

Your message dated Thu, 8 Sep 2016 18:43:43 -0700
with message-id <address@hidden>
and subject line Re: bug#22838: New 'Binary file' detection considered harmful
has caused the debbugs.gnu.org bug report #22838,
regarding New 'Binary file' detection considered harmful
to be marked as done.

(If you believe you have received this mail in error, please contact

22838: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=22838
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: New 'Binary file' detection considered harmful Date: Sun, 28 Feb 2016 12:17:07 +0100 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.6.0 The new heuristics to detect 'Binary files' should be reverted to the old one (before 2.20) as the new one has too big a potential to silently fail important tasks.

One of the most important use cases of grep is processing file lists,
eg. in the pipe: find | grep | tar. This is often done by backup software, eg. the in debian package 'backup2l'.

The new behaviour of grep -- to output 'Binary file matches' after output started -- has silently broken the 'backup2l' script and has the potential of silently breaking many other backup scripts as well.

Test case:

$ find /etc/ssl/certs/ | LANG= grep pem


grep will stop with 'Binary file (standard input) matches' after outputting a small percentage of the existing .pem files.

Expected behaviour:

grep should list all .pem files.

This behaviour is particularly insidious because users may not notice that their backup archives are a bit smaller than before or that their backups complete a bit faster, while many thousand files may be missing.

Q: Why do you use LANG= ?

A: To illustrate the problem and because 'backup2l' does that.

Q: Why don't people use the -a switch?

A: People may not notice anything wrong with their backups until they need them.

Q: Why don't you file a bug against 'backup2l'?

A: I will. But this is such a common use case that I suspect that many of the backup scripts that people wrote just for themselves are now broken.

Q: Why don't you just set the correct locale?

A: Even then it suffices to have one bogus-encoded filename somewhere to break your whole backup. It is easy to catch such a file from the internet or from song or picture metadata.


Marcello Perathoner

--- End Message ---
--- Begin Message --- Subject: Re: bug#22838: New 'Binary file' detection considered harmful Date: Thu, 8 Sep 2016 18:43:43 -0700 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0
Paul Eggert wrote:
On 03/01/2016 02:05 AM, Marcello Perathoner wrote:
2) If you just output

   binary line 42 in file x matches

and continue regular output after the next newline, the breakage would be much
more confined.

This sounds like a good suggestion.  That is, grep could keep going if its only
problem is an attempt to output encoding errors (as opposed to reading null
bytes, which are a more-reliable indication of binary data).  It would probably
be better to output just one "Binary file matches" line per file, at the end of
the other matches, so that it's more likely to be noticed.

I finally got around to implementing this, which turned out to be considerably easier than I thought it would be. I installed the attached patch into the grep Savannah master. I am boldly closing this old bug report; we can always start a new report if further problems turn up.

Attachment: 0001-grep-encoding-errors-suppress-just-their-line.patch
Description: Text Data

--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]