bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#73360: Error when a long list is provided to grep with "--binary-fil


From: Rodrigo Jorge
Subject: bug#73360: Error when a long list is provided to grep with "--binary-files=without-match" option
Date: Fri, 20 Sep 2024 10:54:31 -0300

I could reproduce the same issue without xargs, so I think we can take it
out of the picture:

[user@server folder]$ find -type f -not -path "./.patch_storage/*" -not
-name "tfa_setup" -print > /tmp/file.list
[user@server folder]$ wc -l /tmp/file.list
37443 /tmp/file.list

[user@server folder]$ cat /tmp/file.list | xargs -n 100 grep -Il '.' >
/tmp/list1.list
[user@server folder]$ wc -l /tmp/list1.list
23405 /tmp/list1.list

[user@server folder]$ grep -Il '.' $(cat /tmp/file.list) > /tmp/list2.list
[user@server folder]$ wc -l /tmp/list2.list
23403 /tmp/list2.list

[user@server folder]$ diff /tmp/list1.list /tmp/list2.list
12268,12269d12267
< ./apex/images/apex_ui/psd/apex_5_ui.ai
< ./apex/images/apex_ui/psd/apex-logo.ai
[user@server folder]$

So we can see that running *"grep -Il '.' $(cat /tmp/file.list)"* will also
skip those 2 files, unless the problem is actually bringing them, and xargs
are adding those 2 files somehow.

Those files are PDFs:

[user@server folder]$ file ./apex/images/apex_ui/psd/apex_5_ui.ai
./apex/images/apex_ui/psd/apex_5_ui.ai: PDF document, version 1.5
[user@server folder]$ file ./apex/images/apex_ui/psd/apex-logo.ai
./apex/images/apex_ui/psd/apex-logo.ai: PDF document, version 1.5

[user@server folder]$ head ./apex/images/apex_ui/psd/apex_5_ui.ai
%����1.5
<</Length 39582/Subtype/XML/Type/Metadata>>stream8 0 R 209 0 R]/ON[6 0 R 7
0 R 210 0 R]/Order 211 0 R/RBGroups[]>>/OCGs[6 0 R 7 0 R 5 0 R 208 0 R 210
0 R 209 0 R]>>/Pages 3 0 R/Type/Catalog>>
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.3-c011
66.145661, 2012/02/06-14:56:27        ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";>
      <rdf:Description rdf:about=""
            xmlns:dc="http://purl.org/dc/elements/1.1/";>
         <dc:format>application/pdf</dc:format>
         <dc:title>
            <rdf:Alt>

I could also find exactly the point it breaks:

[user@server folder]$ cat /tmp/file.list | xargs -n 100 grep -Il '.' | wc -l
23405
[user@server folder]$ cat /tmp/file.list | xargs -n 1000 grep -Il '.' | wc
-l
23405
[user@server folder]$ cat /tmp/file.list | xargs -n 2000 grep -Il '.' | wc
-l
23405
[user@server folder]$ cat /tmp/file.list | xargs -n 2871 grep -Il '.' | wc
-l
23405
[user@server folder]$ cat /tmp/file.list | xargs -n 2872 grep -Il '.' | wc
-l
23403

I will reply shortly with the strace findings.

On Fri, Sep 20, 2024 at 10:32 AM David G. Pickett <dgpickett@aol.com> wrote:

> While the output may be bulky, on Linux you can try the strace command to
> see exactly what it is up to.  It will show the execvp() call, for
> instance.  You might need a bigger -s!
>
> $ strace -f -v -s 262144 <YOUR_CMD>
>
> On Thursday, September 19, 2024 at 10:29:30 AM EDT, Rodrigo Jorge <
> rodrigoaraujorge@gmail.com> wrote:
>
>
> Hello. I'm trying to use grep to get the list of all non-binary files in a
> given folder. I tried with the 2.20 and the 3.11 release.
>
> For some reason, grep is providing 2 false negatives when the list is huge.
> This issue does not happen if I break the grep input with "xargs -n X".
>
> Check below:
>
> [opc@oradiff-core dbhome_1]$ grep -V
> grep (GNU grep) 3.11
> Copyright (C) 2023 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <
> https://gnu.org/licenses/gpl.html>.
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
>
> Written by Mike Haertel and others; see
> <https://git.savannah.gnu.org/cgit/grep.git/tree/AUTHORS>.
>
> [opc@oradiff-core dbhome_1]$ find -type f -not -path "./.patch_storage/*"
> -not -name "tfa_setup" -print0 2>> /tmp/error.list | xargs -0 -n 100 grep
> -Il '.' > /tmp/list1.list
>
> [opc@oradiff-core dbhome_1]$ find -type f -not -path "./.patch_storage/*"
> -not -name "tfa_setup" -print0 2>> /tmp/error.list | xargs -0 grep -Il '.'
> > /tmp/list2.list
>
> [opc@oradiff-core dbhome_1]$ diff /tmp/list1.list /tmp/list2.list
> 12268,12269d12267
> < ./apex/images/apex_ui/psd/apex_5_ui.ai
> < ./apex/images/apex_ui/psd/apex-logo.ai
>
> [opc@oradiff-core dbhome_1]$ wc -l /tmp/list1.list /tmp/list2.list
>   23397 /tmp/list1.list
>   23395 /tmp/list2.list
>   46792 total
>
> The output should not show any difference.
>
> The same issue was also reproduced in grep 2.20.
>
> Thanks,
> Rodrigo
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]