bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22838: New 'Binary file' detection considered harmful


From: Eric Blake
Subject: bug#22838: New 'Binary file' detection considered harmful
Date: Mon, 29 Feb 2016 15:37:55 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0

On 02/29/2016 01:11 PM, Marcello Perathoner wrote:

>> Yes, locale dependencies on standard behavior can be annoying.
>>
> 
> You assume that a user will only ever want to grep text files encoded in
> the machine's locale. That is not so.

You've been relying on undefined behavior, and it caught up with you.
It's the same as asking for us to keep use-after-free "working" in a
multithreaded program because it has always "worked" in your older
single-threaded program when nothing was perturbing the memory between
free() and its latent use.  A latent bug in your usage is still a bug in
your usage, even if it took a change in grep's defaults to expose your
problem.

And meanwhile, newer grep 2.23 has improved the heuristics to only
complain about a binary file if it would otherwise be outputting
encoding errors (rather than blindly complaining about the encoding
error up front and stopping processing immediately), which does
alleviate some of the worst of the change caused by your undefined usage
(that is, you can still grep for valid encodings, and get reasonable
results so long as the valid text doesn't mix with lines with invalid
encodings).

> 
> As a German user I have on my disk files in many encodings: utf-8,
> iso-8859-1, win-1252, iso-8859-15, encodings that are now defunct like
> CP850, CP847, "German 7-bit ASCII" that replaced braces with Umlauts,
> old WordStar files that used control characters inside.
> 
> Since 2.21 I will now have to always specify -a or LC_ALL=C when
> grepping my files.

Yes, but then you are no longer relying on undefined behavior, and
therefore have a leg to stand on if we break that behavior.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]