bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Case insensitivity seems to ignore lower bound of interval


From: Davide Brini
Subject: Re: Case insensitivity seems to ignore lower bound of interval
Date: Thu, 28 Apr 2011 13:04:28 +0100
User-agent:

On Thu, 28 Apr 2011 13:29:15 +0200 Eric Bischoff <address@hidden>
wrote:

> > This is strange, since with GNU sed 4.2.1 I get
> > 
> > $ echo 'ijklmnopqrstuvwxyz'| sed 's/[R-Z]/X/g'
> > ijklmnopqrXXXXXXXX
> 
> My first guess, according to your name, was that it is due to a
> difference between French and Italian locales, but that it is not the
> case:
> 
> Generating locales...
>   it_CH.UTF-8... done
>   it_IT.UTF-8... done
> Generation complete.
> (...)
> $ echo 'ijklmnopqrstuvwxyz'| LANG=it_IT.UTF-8 sed 's/[R-Z]/X/g'
> ijklmnopqrstuvwxyz
> $ echo 'ijklmnopqrstuvwxyz'| sed 's/[R-Z]/X/g'
> ijklmnopqrstuvwxyz

To be honest, I don't use the Italian locale. I tried with en_US.UTF-8,
en_GB.UTF-8, fr_FR.UTF-8 (yes), and es_ES.UTF-8, all with the same results.

$ echo 'ijklmnopqrstuvwxyz'| LC_ALL=fr_FR.utf8 sed 's/[R-Z]/X/g'
ijklmnopqrXXXXXXXX 

But you got me curious. My original test system used mostly vanilla tools
built from source, so I thought I would try on stock distros instead.
And guess what, both on a standard RHEL 6 and Debian squeeze, I see your
results (ie gawk behaves differently).


> > So I would definitely expect grep to follow awk's and sed's behavior.
> 
> Even if their behaviours were consistent in French, which they are not, I 
> would still consider that as buggy. When someone writes "|R-Z]", they 
> certainly do not expect "r" to be handled differently as "s".

Maybe, but my point was that it was no gawk-only bug. But now, having been
able to reproduce your results, it may well be that gawk does something
different. Arnold is the authoritative source here. 

 
> > > $ echo 'ijklmnopqr'| grep "[r-z]"
> > > ijklmnopqr
> > > $ echo 'ijklmnopqr'| grep "[R-Z]"
> > 
> >
> > It looks like 2.5.4 was doing it, but not 2.7, so something probably
> > changed in between:
> >
> > $ echo 'ijklmnopqrstuvwxyz' | \grep '[R-Z]'
> > ijklmnopqrstuvwxyz
> 
> This is not the same test.
> 
> My list of letters in the "echo" part intentionally was stopping at "r"
> in the grep test, to concentrate on what happens to "r", without
> influence from what happens to "s".

If you stop at "r", you won't get any output. The only way you would get
output is if [R-Z] was implemented as "RrSs..." etc., which seems not to
be the case; rather, it seems to be the other way round ("rRsS..." etc.).

-- 
D.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]