bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22838: New 'Binary file' detection considered harmful


From: Marcello Perathoner
Subject: bug#22838: New 'Binary file' detection considered harmful
Date: Mon, 29 Feb 2016 21:11:02 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.6.0

On 02/29/2016 06:56 PM, Eric Blake wrote:
On 02/29/2016 10:54 AM, Eric Blake wrote:
Encoding errors are not characters, but bytes.  A line cannot contain
encoding errors.  Therefore, a file with encoding errors is not a text file.

Corollary - there exist files which are text files in some locales, but
binary files in others (based on whether the locale interprets the bytes
as an encoding error or as valid characters).

Yes, locale dependencies on standard behavior can be annoying.


You assume that a user will only ever want to grep text files encoded in the machine's locale. That is not so.

As a German user I have on my disk files in many encodings: utf-8, iso-8859-1, win-1252, iso-8859-15, encodings that are now defunct like CP850, CP847, "German 7-bit ASCII" that replaced braces with Umlauts, old WordStar files that used control characters inside.

Since 2.21 I will now have to always specify -a or LC_ALL=C when grepping my files.




Regards

--
Marcello Perathoner






reply via email to

[Prev in Thread] Current Thread [Next in Thread]