emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#22606: closed (\? and \* behavior near the start o


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#22606: closed (\? and \* behavior near the start of an expression disagree)
Date: Tue, 09 Feb 2016 02:57:02 +0000

Your message dated Mon, 8 Feb 2016 19:56:37 -0700
with message-id <address@hidden>
and subject line Re: bug#22606: \? and \* behavior near the start of an 
expression disagree
has caused the debbugs.gnu.org bug report #22606,
regarding \? and \* behavior near the start of an expression disagree
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
22606: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=22606
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: \? and \* behavior near the start of an expression disagree Date: Mon, 8 Feb 2016 17:28:57 -0800
(1) grep '?'   # matches literal ?
               # (i would expect a parse error, but whatever)
(2) grep '\?'  # also matches literal ?
(3) grep ' \?' # matches everything, but given that the last
               # _expression_ matches a literal ?, i would expect this
               # to match a space followed by a literal ?

notice that * does not have the same behavior:
(4) grep '*'   # matches literal *
               # (i would expect a parse error, but whatever)
(5) grep '\*'  # matches literal *
(6) grep ' \*' # matches space followed by literal *

cases (3) and (6) behave differently.  imo (6) looks reasonable, but (3) does not.  could someone comment on whether this is working as intended, and if so, what is the rationale?

- chris


--- End Message ---
--- Begin Message --- Subject: Re: bug#22606: \? and \* behavior near the start of an expression disagree Date: Mon, 8 Feb 2016 19:56:37 -0700 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0
tag 22606 notabug
thanks

On 02/08/2016 06:28 PM, Chris Calabro wrote:

Thanks for the report. However, the behavior you see is intentional.

It helps to read POSIX on how grep uses Basic Regular Expressions:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html

In particular:
"This section uses the term "invalid" for certain constructs or
conditions. Invalid REs shall cause the utility or function using the RE
to generate an error condition. When invalid is not used, violations of
the specified syntax or semantics for REs produce undefined results:
this may entail an error, enabling an extended syntax for that RE, or
using the construct in error as literal characters to be matched."

> (1) grep '?'   # matches literal ?
>                # (i would expect a parse error, but whatever)

Well-defined behavior per POSIX.  '?' is an ordinary character in BRE,
which means it MUST match a literal '?'.  No parse error is possible.

(But if this were an ERE, where ? is a metacharacter but appears at the
start of the expression, this would be undefined behavior).

> (2) grep '\?'  # also matches literal ?

Undefined, but not invalid, behavior per POSIX, because POSIX says \ in
BRE is only well-defined when immediately before a metacharacter, but ?
is not a metacharacter in BRE.  So we can make it do whatever we want.
We defined BRE "backslash-question" to mean "behave like ERE question" -
but ERE question is undefined at the start of the expression, so we have
yet another choice: report it as a syntax error, or match a literal "?"
instead.  As you can see, we match the literal "?" instead.

> (3) grep ' \?' # matches everything, but given that the last
>                # expression matches a literal ?, i would expect this
>                # to match a space followed by a literal ?

Undefined, but not invalid, behavior per POSIX.  As in (2), we make it
behave like ERE "question", which means this regular expression now
matches 0 or 1 instance of space, and since 0 spaces can be matched on
anything, the overall expression matches everything (well, insofar as
there are no encoding errors to mess up the definition of "everything").


> 
> notice that * does not have the same behavior:
> (4) grep '*'   # matches literal *
>                # (i would expect a parse error, but whatever)

Well-defined per POSIX, where it MUST match a literal "*" when used as
the first character of the BRE.

(But if this were an ERE, it would be undefined behavior to start the
expression with *).

> (5) grep '\*'  # matches literal *

Well-defined per POSIX, where it MUST match a literal "*" (since the
backslash says to treat the * as an ordinary character instead of its
usual metacharacter meaning).

> (6) grep ' \*' # matches space followed by literal *

Well-defined per POSIX, where it MUST match the two-character sequence
space then star.

> 
> cases (3) and (6) behave differently.

Well, yeah, because ? and * are not identical in BRE - one is a literal
character unless you use backslash to escape it into our extension of
behaving like a metacharacter; the other is a metacharacter unless you
use backslash to escape it into an ordinary character.

>  imo (6) looks reasonable, but (3)
> does not.  could someone comment on whether this is working as intended,
> and if so, what is the rationale?

The rationale is history. POSIX standardized Basic Regular Expressions
based on existing practice; and the original implementation of grep did
NOT support ? as a special character at all.  Later on, other programs
invented Extended Regular Expressions, and gave ? a special meaning,
then even later, people realized that ? was useful, but since BRE were
already baked in as '?' matching literally, we had to invent '\?' as the
extension to use it as a metacharacter.

If you build a time machine and could go back 40 years to invent regular
expressions from scratch, please have the decency to invent just ONE
syntax, not 20 disparate flavors (of which the two most popular become
POSIX BRE and ERE, with weird rules on what is valid where).  But since
this behavior is intentional and required by POSIX, I'm closing this as
not a bug.  Feel free to reply to the thread with further questions, though.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]