bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[^\]] in basic regexes


From: Wacek Kusnierczyk
Subject: [^\]] in basic regexes
Date: Fri, 13 Feb 2009 15:54:01 +0100
User-agent: Thunderbird 2.0.0.19 (X11/20090105)

hello,

i observe a behaviour of grep that i am not sure is correct, possibly
due to my misunderstanding.

i've recently reviewed code written is some language were the intent was
to match a sequence of any number of non-']' characters.  the matching
was done with an underlying regex library, and i have tried the pattern
directly with grep.

with grep, the pattern '[^]]' matches one non-] character:

grep '[^]]' <<< '[\]'
# match

however, in that code the pattern was '[^\]]*' (with the idea that the
character ']' is a metacharacter and therefore must be escaped). 
according to the docs i know, it is not necessary to escape ']' within a
character class when it's the first character there (as in '[]]'), since
it then is not considered meta;  but it shouldn't be harmful.  it
happens that this pattern won't do:

grep '[^\]]' <<< '[\]'
# no match

this seems strange;  i'd read the pattern as 'one character that is not
]'.  clearly, the data has two such characters.  alternatively, the
pattern could be read as 'one character that is neither \ nor ]', but
this would require the backslash to be treated as a regular character
(not a meta):

grep '[\]' <<< '[\]'
# match
grep '[^\]' <<< '[\]'
# match
grep '[^\[]' <<< '[\]'
# match

in fact, the third above has one possible match, so the pattern is read
as 'one non-\ non-[' rather than as 'one non-[':

grep -o '[^\[]' <<< '[\]'
# ]

so the 'one non-\ non-]' reading of  '[^\]]' is not implausible;  then,
there would one match, but there is none. 

it actually appears that the pattern is read as 'one non-\ followed by
one ]':

grep -o '[^\]]' <<< '[]'
# []

that is, the first ] is not escaped (coherently with the case of
'[^\[]') but rather closes the character class, and the second
(unescaped!) ] does not close any class, but is taken literally! 
(should this not be an invalid regex, with an unmatched class-closing
bracket?)

i haven't looked at the sources of grep, so these are plain guesses, but
is the behaviour of grep with '[^\]]' correct and intended, or is it a bug?

grep -V
# GNU grep 2.5.3

regards,
wacek




reply via email to

[Prev in Thread] Current Thread [Next in Thread]