bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23892: grep is not "grepping" from grep-2.23-1 (archlinux) with exte


From: Jim Meyering
Subject: bug#23892: grep is not "grepping" from grep-2.23-1 (archlinux) with external fixed patterns file.
Date: Mon, 4 Jul 2016 07:51:13 -0700

On Mon, Jul 4, 2016 at 6:57 AM, Pascal <address@hidden> wrote:
> hi,
>
> I've a big (3.3Go) gzipped file which comes from nsrl with fields separated
> by one tabulation :
>
> $ zcat nsrlfiletxt.gz | head -2
> sha-1    md5    crc32    filename    filesize    productcode
> opsystemcode    specialcode
> 000000206738748edd92c4e3d2e823896700f849
> 392126e756571ebf112cb1c1cdedf926    ebd105a0    i05002t2.pfb    98865
> 3095    win
>
> I've a file with fixed patterns (windows only from field 7 opsystemcode) :
>
> $ cat win.os
> 2000 sp 4
> 2ksp3
> dos
> ...
> xp sp2
> xphomeedw/sp2
> xpprofessw/sp2
>
> my os is :
>
> $ uname -a
> Linux arch 4.4.14-1-lts #1 SMP Fri Jun 24 21:35:25 CEST 2016 x86_64
> GNU/Linux
>
> and grep is :
>
> $ grep --version
> grep (GNU grep) 2.25
> ...
>
> $ pacman -Q grep
> grep 2.25-2
>
> when I try this :
>
> $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
> 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
> 59,4k 0:00:00 [ 776k/s] [ <=> ]
>
> only 59.4k lines are processed, with no error :-( !
> (sed is used on win.os to match only on field and pipe view is used to show
> progess)
>
> I downgrade to grep 2.24 :
>
> # pacman -U /var/cache/pacman/pkg/grep-2.24-1-x86_64.pkg.tar.xz
> ...
>
> and retry this (the same) :
>
> $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
> 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
> 59,4k 0:00:00 [ 863k/s] [ <=> ]
>
> again, only 59.4k lines are processed, with no error :-( !
>
> I downgrade to grep 2.23 :
>
> # pacman -U /var/cache/pacman/pkg/grep-2.23-1-x86_64.pkg.tar.xz
> ...
>
> and retry this (the same) :
>
> $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
> 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
> 59,1k 0:00:00 [ 823k/s] [ <=> ]
>
> only 59.1k lines are processed, with no error :-( !
>
> I downgrade to grep 2.22 :
>
> # pacman -U /var/cache/pacman/pkg/grep-2.22-1-x86_64.pkg.tar.xz
> ...
>
> and retry this (the same) :
>
> $ zcat nsrlfiletxt.gz | pv -l | grep --fixed-strings --file=<( sed
> 's;^.*$;\t&\t;' win.os ) > /opt/nsrl.windows
>  157M 0:04:36 [ 567k/s] [ <=> ]
>
> all the 157M of lines are well processed :-) !
>
> so I think there's a bug introduced with grep 2.23...

Thank you for the report. However, I'll bet that your input is not
what POSIX calls a "text file," and your locale is neither C nor
POSIX. I.e., I'll bet the input contains a NUL byte or a sequence of
bytes that constitutes an invalid character in your locale. Either of
those would make your use of grep non-conformant. You may be able to
make your command work portably by adding grep's "-a" option or by
running grep in the C locale:

  zcat nsrlfiletxt.gz | pv -l | LC_ALL=C grep --fixed-strings --file=...

or

  zcat nsrlfiletxt.gz | pv -l | grep -a --fixed-strings --file=...

If you look at the actual output, you should see an indication of the
problem: when you have less output than expected, there should be at
least one line of the form "Binary file ... matches".





reply via email to

[Prev in Thread] Current Thread [Next in Thread]