bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22838: New 'Binary file' detection considered harmful


From: Marcello Perathoner
Subject: bug#22838: New 'Binary file' detection considered harmful
Date: Sun, 28 Feb 2016 12:17:07 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.6.0

The new heuristics to detect 'Binary files' should be reverted to the old one (before 2.20) as the new one has too big a potential to silently fail important tasks.


One of the most important use cases of grep is processing file lists,
eg. in the pipe: find | grep | tar. This is often done by backup software, eg. the in debian package 'backup2l'.

The new behaviour of grep -- to output 'Binary file matches' after output started -- has silently broken the 'backup2l' script and has the potential of silently breaking many other backup scripts as well.


Test case:

$ find /etc/ssl/certs/ | LANG= grep pem

Outcome:

grep will stop with 'Binary file (standard input) matches' after outputting a small percentage of the existing .pem files.

Expected behaviour:

grep should list all .pem files.


This behaviour is particularly insidious because users may not notice that their backup archives are a bit smaller than before or that their backups complete a bit faster, while many thousand files may be missing.



Q: Why do you use LANG= ?

A: To illustrate the problem and because 'backup2l' does that.

Q: Why don't people use the -a switch?

A: People may not notice anything wrong with their backups until they need them.

Q: Why don't you file a bug against 'backup2l'?

A: I will. But this is such a common use case that I suspect that many of the backup scripts that people wrote just for themselves are now broken.

Q: Why don't you just set the correct locale?

A: Even then it suffices to have one bogus-encoded filename somewhere to break your whole backup. It is easy to catch such a file from the internet or from song or picture metadata.



Regards

--
Marcello Perathoner






reply via email to

[Prev in Thread] Current Thread [Next in Thread]