[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Case insensitivity seems to ignore lower bound of interval
From: |
Davide Brini |
Subject: |
Re: Case insensitivity seems to ignore lower bound of interval |
Date: |
Thu, 28 Apr 2011 13:04:28 +0100 |
User-agent: |
|
On Thu, 28 Apr 2011 13:29:15 +0200 Eric Bischoff <address@hidden>
wrote:
> > This is strange, since with GNU sed 4.2.1 I get
> >
> > $ echo 'ijklmnopqrstuvwxyz'| sed 's/[R-Z]/X/g'
> > ijklmnopqrXXXXXXXX
>
> My first guess, according to your name, was that it is due to a
> difference between French and Italian locales, but that it is not the
> case:
>
> Generating locales...
> it_CH.UTF-8... done
> it_IT.UTF-8... done
> Generation complete.
> (...)
> $ echo 'ijklmnopqrstuvwxyz'| LANG=it_IT.UTF-8 sed 's/[R-Z]/X/g'
> ijklmnopqrstuvwxyz
> $ echo 'ijklmnopqrstuvwxyz'| sed 's/[R-Z]/X/g'
> ijklmnopqrstuvwxyz
To be honest, I don't use the Italian locale. I tried with en_US.UTF-8,
en_GB.UTF-8, fr_FR.UTF-8 (yes), and es_ES.UTF-8, all with the same results.
$ echo 'ijklmnopqrstuvwxyz'| LC_ALL=fr_FR.utf8 sed 's/[R-Z]/X/g'
ijklmnopqrXXXXXXXX
But you got me curious. My original test system used mostly vanilla tools
built from source, so I thought I would try on stock distros instead.
And guess what, both on a standard RHEL 6 and Debian squeeze, I see your
results (ie gawk behaves differently).
> > So I would definitely expect grep to follow awk's and sed's behavior.
>
> Even if their behaviours were consistent in French, which they are not, I
> would still consider that as buggy. When someone writes "|R-Z]", they
> certainly do not expect "r" to be handled differently as "s".
Maybe, but my point was that it was no gawk-only bug. But now, having been
able to reproduce your results, it may well be that gawk does something
different. Arnold is the authoritative source here.
> > > $ echo 'ijklmnopqr'| grep "[r-z]"
> > > ijklmnopqr
> > > $ echo 'ijklmnopqr'| grep "[R-Z]"
> >
> >
> > It looks like 2.5.4 was doing it, but not 2.7, so something probably
> > changed in between:
> >
> > $ echo 'ijklmnopqrstuvwxyz' | \grep '[R-Z]'
> > ijklmnopqrstuvwxyz
>
> This is not the same test.
>
> My list of letters in the "echo" part intentionally was stopping at "r"
> in the grep test, to concentrate on what happens to "r", without
> influence from what happens to "s".
If you stop at "r", you won't get any output. The only way you would get
output is if [R-Z] was implemented as "RrSs..." etc., which seems not to
be the case; rather, it seems to be the other way round ("rRsS..." etc.).
--
D.
- Re: Case insensitivity seems to ignore lower bound of interval, (continued)
Re: Case insensitivity seems to ignore lower bound of interval, arnold, 2011/04/28
Re: Case insensitivity seems to ignore lower bound of interval, Paul Jarc, 2011/04/28
Re: Case insensitivity seems to ignore lower bound of interval, Eric Bischoff, 2011/04/29
Re: Case insensitivity seems to ignore lower bound of interval, Aharon Robbins, 2011/04/29