bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#32704: Can grep search for a line feed and a null character at the s


From: Assaf Gordon
Subject: bug#32704: Can grep search for a line feed and a null character at the same time?
Date: Sat, 15 Sep 2018 14:20:40 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

Hello,

On 15/09/18 11:57 AM, address@hidden wrote:
Le 15/09/2018 à 19:06, Eric Blake a écrit :
On 9/15/18 11:43 AM, address@hidden wrote:
But is it at least possible to find “\x0A\x00” with grep?

If you bend the rules by throwing -P into the mix, yes :)

So it is possible to find “\x0A\x00” alone, but for example “\x74\x00\x0D\x00\x0A\x00\x74\x00\x65\00” is impossible to find with the “-P” option?

If I may suggest a different tool, GNU sed can handle such regexes more easily than grep.
The 'trick' is to accumulate multiple lines into memory, then run the
regex on the entire buffer.

1.
If you input is small enough to fit in memory,
you can load the entire file into memory,
and run the regex on the buffer:

$ printf '\xFF\xFE\x0D\x00\x0A\x00\x74\x00\x65\x00\x73\x00\x74\x00\x0D\x00\x0A\x00\x74\x00\x65\x00\x73\x00\x74\x00\x5F\x00\x74\x00\x77\x00\x6F\x00\x0D\x00\x0A\x00' \
     | LC_ALL=C sed -n 'H;$!d ; x ; /\x0a\x00/q0 ; q1' \
           && echo MATCH || echo NO-MATCH

The "H;$!d" commands accumulate lines into the hold buffer.
The "x" command copies the hold buffer into the pattern buffer.
Then the regex "/\x0a\x00/" searches in the buffer.
If there was a match, sed quits with exit code 0 ("q0").
Otherwise, sed quits with exit code 1 ("q1").


2.
If the file is too big to fit in memory,
you can process it line-by-line like so:

$ printf '\xFF\xFE\x0D\x00\x0A\x00\x74\x00\x65\x00\x73\x00\x74\x00\x0D\x00\x0A\x00\x74\x00\x65\x00\x73\x00\x74\x00\x5F\x00\x74\x00\x77\x00\x6F\x00\x0D\x00\x0A\x00' \
     | LC_ALL=C sed -n 'N;/\x00\x0a/q0;$q1;D;' \
             && echo MATCH || echo NO-MATCH

The N,D commands work in tandem to append the next line into the
buffer, then delete the last line from the buffer (think FIFO).
The regex then operates on the buffer which contains the last two lines.



More details are in the manual:
 https://www.gnu.org/software/sed/manual/sed.html#Multiline-techniques
https://www.gnu.org/software/sed/manual/sed.html#Text-search-across-multiple-lines



regards,
 - assaf






reply via email to

[Prev in Thread] Current Thread [Next in Thread]