|
From: | Marcello Perathoner |
Subject: | bug#22838: New 'Binary file' detection considered harmful |
Date: | Mon, 29 Feb 2016 21:11:02 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.6.0 |
On 02/29/2016 06:56 PM, Eric Blake wrote:
On 02/29/2016 10:54 AM, Eric Blake wrote:Encoding errors are not characters, but bytes. A line cannot contain encoding errors. Therefore, a file with encoding errors is not a text file.Corollary - there exist files which are text files in some locales, but binary files in others (based on whether the locale interprets the bytes as an encoding error or as valid characters). Yes, locale dependencies on standard behavior can be annoying.
You assume that a user will only ever want to grep text files encoded in the machine's locale. That is not so.
As a German user I have on my disk files in many encodings: utf-8, iso-8859-1, win-1252, iso-8859-15, encodings that are now defunct like CP850, CP847, "German 7-bit ASCII" that replaced braces with Umlauts, old WordStar files that used control characters inside.
Since 2.21 I will now have to always specify -a or LC_ALL=C when grepping my files.
Regards -- Marcello Perathoner
[Prev in Thread] | Current Thread | [Next in Thread] |