bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#73360: Error when a long list is provided to grep with "--binary-fil


From: Paul Eggert
Subject: bug#73360: Error when a long list is provided to grep with "--binary-files=without-match" option
Date: Sat, 21 Sep 2024 23:39:38 -0700
User-agent: Mozilla Thunderbird

On 2024-09-20 22:41, Paul Eggert wrote:
I have the sneaking suspicion that the script is assuming properties of 'grep' that are not documented and that are not guaranteed.

In looking into the code a bit more, I can see some places where that is what is happening.

A couple of things.

First, grep 3.11 uses buffer sizes that depend on earlier files that it has scanned, and this affects whether grep decides later files are binary. This can lead to the sort of confusion that you mentioned. There are performance reasons to think that grep should not grow buffer sizes for later files merely because earlier files had very long lines, as huge buffers can hurt performance; so I installed onto the development repository on Savannah the first attached patch to fix that. As a side effect this may fix the symptoms you observed.

Second, 'grep' is not a good tool for determining whether a file is text or binary, since the definition of "text" vs "binary" is application-specific and grep's definition is suitable for 'grep' and it's problematic to use it elsewhere. I installed the second attached patch to try to document this better.

Hope this helps.

Boldly closing this bug as fixed; if I'm wrong we can reopen it.

Attachment: 0001-grep-avoid-huge-reads.patch
Description: Text Data

Attachment: 0002-doc-warn-re-using-grep-to-detect-binary-files.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]