[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#22838: New 'Binary file' detection considered harmful
From: |
Marcello Perathoner |
Subject: |
bug#22838: New 'Binary file' detection considered harmful |
Date: |
Sun, 28 Feb 2016 12:17:07 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.6.0 |
The new heuristics to detect 'Binary files' should be reverted to the
old one (before 2.20) as the new one has too big a potential to silently
fail important tasks.
One of the most important use cases of grep is processing file lists,
eg. in the pipe: find | grep | tar. This is often done by backup
software, eg. the in debian package 'backup2l'.
The new behaviour of grep -- to output 'Binary file matches' after
output started -- has silently broken the 'backup2l' script and has the
potential of silently breaking many other backup scripts as well.
Test case:
$ find /etc/ssl/certs/ | LANG= grep pem
Outcome:
grep will stop with 'Binary file (standard input) matches' after
outputting a small percentage of the existing .pem files.
Expected behaviour:
grep should list all .pem files.
This behaviour is particularly insidious because users may not notice
that their backup archives are a bit smaller than before or that their
backups complete a bit faster, while many thousand files may be missing.
Q: Why do you use LANG= ?
A: To illustrate the problem and because 'backup2l' does that.
Q: Why don't people use the -a switch?
A: People may not notice anything wrong with their backups until they
need them.
Q: Why don't you file a bug against 'backup2l'?
A: I will. But this is such a common use case that I suspect that many
of the backup scripts that people wrote just for themselves are now broken.
Q: Why don't you just set the correct locale?
A: Even then it suffices to have one bogus-encoded filename somewhere to
break your whole backup. It is easy to catch such a file from the
internet or from song or picture metadata.
Regards
--
Marcello Perathoner
- bug#22838: New 'Binary file' detection considered harmful,
Marcello Perathoner <=
- bug#22838: New 'Binary file' detection considered harmful, Paul Eggert, 2016/02/28
- bug#22838: New 'Binary file' detection considered harmful, Marcello Perathoner, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Eric Blake, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Marcello Perathoner, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Eric Blake, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Eric Blake, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Marcello Perathoner, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Holger Bruenjes, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Eric Blake, 2016/02/29
- bug#22838: New 'Binary file' detection considered harmful, Paul Eggert, 2016/02/29