bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bash-4.3.33 regexp bug


From: Stephane Chazelas
Subject: Re: bash-4.3.33 regexp bug
Date: Thu, 5 Mar 2015 20:11:55 +0000
User-agent: Mutt/1.5.21 (2010-09-15)

2015-03-05 12:36:39 -0500, Greg Wooledge:
> On Thu, Mar 05, 2015 at 05:26:00PM +0000, Stephane Chazelas wrote:
> > The bash manual only points to regex(3).
> > 
> > So it's down to your system's regex library (uses
> > regcomp(REG_EXTENDED)) which on recent GNU systems supports \s.
> 
> I see.  So it's another nonportable feature like printf '%(%s)T'.
> Good to know!
> 
> imadev:~$ s='foo bar'; r='\s'; if [[ $s =~ $r ]]; then echo match; fi
> imadev:~$ printf '%(%s)T\n' -1
> s
> 
> wooledg@wooledg:~$ s='foo bar'; r='\s'; if [[ $s =~ $r ]]; then echo match; fi
> match
> wooledg@wooledg:~$ printf '%(%s)T\n' -1
> 1425576833
[...]

It's a bit worse than %(%s)T (another ksh93 feature (a subset
thereof as ksh93 also parses the argument as a date)) in that
while %(xxx)T passes xxx to strftime verbatim, in [[ ... =~ xxx
]], bash does some modification on xxx making some assumtion on
the syntax of that regex (provided by a 3rd party).

Since 3.2, shell-quotings (so with \, ', ") a regexp "escapes"
the regular expression operators.

That (I think) was done for compatibility with ksh93, but while
ksh93 has its own AT&T regexps, bash uses 3rd parties'.

So for instance, when you write:

[[ foo =~ ".". ]]

bash calls regcomp() with "\..".

There used to be a bug in that: ["."] would be turned into [\.]
(matching backslash in addition to dot).

Now bash should work as long as you use POSIX compatible
regexps and the system's regexp library is POSIX compliant.

When you want to make use of extensions in your system's regexps
is where it starts to be tricky and it helps to know how bash
works in that regard.

[[ foo =~ \s ]]

would call regcomp with "s" (backslash is taken as shell
quoting, s is not a POSIX regex operator so a \ is not added),
and \\s or "\s" with "\\s" (double backslash s) (quoted \, \
is also a regexp operator so \ added) . That's why you need the
variable to be able to use that non-POSIX \s extension.

[[ foo = $var ]] passes the content of $var verbatim to regcomp,
while [[ foo = "$var" ]] passes the content of $var with regexp
operators escaped.

You can also do:

bs='\'
[[ " " =~ ${bs}s ]]

to pass "\s" to regcomp().

-- 
Stephane




reply via email to

[Prev in Thread] Current Thread [Next in Thread]