bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#41558: Regexp Bug


From: Jim Meyering
Subject: bug#41558: Regexp Bug
Date: Tue, 22 Sep 2020 16:40:02 -0700

On Wed, May 27, 2020 at 11:30 PM Norihiro Tanaka <noritnk@kcn.ne.jp> wrote:
> On Tue, 26 May 2020 21:14:12 -0700
> "anton.paras" <anton@paras.nu> wrote:
>
> > I posted to Stack Exchange, and they recommended that I file a bug. I'd 
> > rather not copy+paste it all, so here's the link:
> >
> >
> >
> > https://unix.stackexchange.com/questions/579889/why-doesnt-this-sed-command-replace-the-3rd-to-last-and
> >
> >
> >
> > here's an example
> >
> >
> >
> > > echo 'dog and foo and bar and baz land good' |??? sed -E 
> > > 's/(.*)\band\b((.*\band\b){2})/\1XYZ\2/'
> >
> >
> >
> > expected output:?dog XYZ foo and bar and baz land good
> >
> > actual output:?dog and foo XYZ bar and baz land good
> >
> >
> > here's my sed --version output:?sed (GNU sed) 4.2.2
> >
> >
> >
> > I hope this is helpful, cheers!
>
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p'
> foo and bar land
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/.*\band.*\band/p'
> $
>
> It seems that there is the bug in regex.
>
> expected:
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p'
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/.*\band.*\band/p'
> $
>
> It also reproduces in grep.
>
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 grep -E '(.*\band){2}'
> foo and bar land
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 grep -E '.*\band.*\band'
> $

I agree that this looks like a regex bug. This should print nothing:
  echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p'
just as this already does:
  echo 'foo and bar land' | env LC_ALL=C sed -nE '/(.*\band){2}/p'

Does anyone know if there's a glibc bug number for it?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]