[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20638: BUG: standard & extended RE's don't find NUL's :-(

From: Paul Eggert
Subject: bug#20638: BUG: standard & extended RE's don't find NUL's :-(
Date: Mon, 25 May 2015 08:18:56 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

Linda Walsh wrote:

I had one file that it bailed on
saying it has an invalid UTF-8 encoding -- but the line was
recursive starting from '.' -- and it didn't name the file

That's pretty vague.  Can you reproduce that problem?  I don't observe it:

$ mkdir d
$ printf 'a\200\n' >d/f
$ printf 'b\200\n' >d/g
$ grep -r a d
Binary file d/f matches

"-a" doesn't work, BTW:

Ishtar:/tmp> grep -a '\000\000' zeros
Ishtar:/tmp> echo $?

That's the way 'grep' has always behaved. The regular expression '\0' matches the string "0", not the NUL byte.

Ishtar:/tmp> grep -P '\000\000' zeros Binary file zeros matches

I don't follow this example; perhaps some text was omitted? Anyway, -P has always treated files containing zeros as binary files too, ever since -P has been introduced. It's the same as without -P.

But there it is -- if grep wasn't meant to handle binary files,
it wouldn't know to call 'zeroes' a binary file.

Obviously, grep *is* meant to handle binary files; it's documented to handle them in a particular way.

how can 'shuf' claim to work on input lines yet have this allowed:

   -z, --zero-terminated
line delimiter is NUL, not newline.

I don't follow this point.  -z is a nice feature; we don't want to get rid of 

People argue to dumb down POSIX
utils, because some corp wants to get a posix label but
has a few shortcomings -- so they donate enough money and
posix changes it's rules.

I'm afraid you've gone off the deep end here.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]