bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#26322: grep '*' VS grep -E '*'


From: Eric Blake
Subject: bug#26322: grep '*' VS grep -E '*'
Date: Fri, 31 Mar 2017 10:42:40 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0

tag 26322 notabug
thanks

On 03/31/2017 08:37 AM, Julien Denis wrote:
> Hello,
> 
> Assuming that "textfile" is a regular non empty text file, is it
> normal that grep '*' textfile returns nothing but grep -E '*' textfile
> returns all the lines ?
> I got this using Debian 7.1 stable and so grep is version 2.20.
> Would a newer grep version resolve this or is it not a bug (but a
> valid behavior of the star character in ERE) ?

According to POSIX, the regular expression '*' has a different
interpretation under BRE than under ERE:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html

In BRE (plain 'grep' style), 9.3.3 states that
"    The <asterisk> shall be special except when used:

        In a bracket expression

        As the first character of an entire BRE (after an initial '^',
if any)"

so it means you are searching for the literal character '*'.  In your
case of no output, that means that your textfile contains no literal '*'
on any line.

In ERE ('grep -E' style), 9.4.3 states that

"*+?{
    The <asterisk>, <plus-sign>, <question-mark>, and <left-brace> shall
be special except when used in a bracket expression (see RE Bracket
Expression). Any of the following uses produce undefined results:
    If these characters appear first in an ERE, or immediately following
an unescaped <vertical-line>, <circumflex>, <dollar-sign>, or
<left-parenthesis>"

So your regular expression is undefined, and we can make it mean
whatever we want (whether we error out, or treat it as equivalent to
some other regular expression, it doesn't matter - you are outside the
bounds of POSIX so you can't rely on our behavior to be consistent).

My guess is that your combination of libc and grep version (yes, it
might be different across versions or on different platforms) has an
interpretation where '*' is treated the same as searching for
zero-or-more instances of the regular expression '', and since the empty
regular expression matches everywhere, zero-or-more instances of that
regular expression will also match everywhere, and you thus get the
result of every line of textfile output.  But that doesn't mean you
should expect that behavior to stay the same.

Maybe you are mixing regular expressions with globs.  If you want to
search for zero-or-more characters with a glob, you use '*'; but that
translates to '.*' in both BRE and ERE syntax.

At any rate, I don't see this as a bug, so I'm closing the instance in
the bug-tracker, but feel free to reply with further comments or questions.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]