[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#73360: Error when a long list is provided to grep with "--binary-fil
From: |
Paul Eggert |
Subject: |
bug#73360: Error when a long list is provided to grep with "--binary-files=without-match" option |
Date: |
Sat, 21 Sep 2024 23:39:38 -0700 |
User-agent: |
Mozilla Thunderbird |
On 2024-09-20 22:41, Paul Eggert wrote:
I have the sneaking suspicion that the script is assuming properties of
'grep' that are not documented and that are not guaranteed.
In looking into the code a bit more, I can see some places where that is
what is happening.
A couple of things.
First, grep 3.11 uses buffer sizes that depend on earlier files that it
has scanned, and this affects whether grep decides later files are
binary. This can lead to the sort of confusion that you mentioned. There
are performance reasons to think that grep should not grow buffer sizes
for later files merely because earlier files had very long lines, as
huge buffers can hurt performance; so I installed onto the development
repository on Savannah the first attached patch to fix that. As a side
effect this may fix the symptoms you observed.
Second, 'grep' is not a good tool for determining whether a file is text
or binary, since the definition of "text" vs "binary" is
application-specific and grep's definition is suitable for 'grep' and
it's problematic to use it elsewhere. I installed the second attached
patch to try to document this better.
Hope this helps.
Boldly closing this bug as fixed; if I'm wrong we can reopen it.
0001-grep-avoid-huge-reads.patch
Description: Text Data
0002-doc-warn-re-using-grep-to-detect-binary-files.patch
Description: Text Data