[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20526: BUG: text file is detected as binary

From: Paul Eggert
Subject: bug#20526: BUG: text file is detected as binary
Date: Tue, 12 May 2015 17:08:42 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0

Eric Blake wrote:
I'm still a bit worried that encoding errors encountered on input, even
though they don't match for output, may still cause issues for some
patterns (we've had cases of encoding errors causing 'grep -P' to go
into an infinite loop, for example);

Yes, that's right. We can't go back to the old way of doing things. Encoding errors in the data must not be matched by any regular expression (not even "."). 'grep -P' won't loop if we never pass encoding errors to the PCRE matcher, so that's what we gotta do.

but yes, as the behavior is
undefined, we are still justified in adopting those heuristics, if
someone is willing to contribute a patch along those lines.

The hard part about it (and the reason I haven't written up a patch yet) is making sure the above property holds, while continuing to have good performance in the typical case where the input is validly encoded. I suppose it's OK, though, if the change hurts performance only for the -P case, since -P is so slow anyway.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]